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FOREWORD 


The  Simulation  Systems  Technical  Area  of  the  Army  Research  Institute  for 
the  Behavioral  and  Social  Sciences  (ARI)  performs  research  and  development  in 
the  areas  of  training  devices  and  simulators  in  the  Army.  Of  special  interest 
is  research  concerning  the  evaluation  of  training  device  effectiveness. 

Throughout  the  acquisition  of  a  simulator  or  training  device,  training 
effectiveness  must  be  evaluated.  Ideally,  an  empirical  transfer  of  training 
test  would  provide  the  data  needed  for  an  evaluation.  However,  when  empirical 
data  cannot  be  obtained,  training  device  effectiveness  can  only  be  estimated 
using  analytic  methods. 

This  report  provides  a  critical  review  of  analytic  methods  recently 
developed  by  the  Army  for  the  evaluation  of  training  device  effectiveness. 

The  results  of  this  report  have  implications  for  training  developers  in  PM 
TRADE  and  TRADOC  and  for  researchers  in  the  field  of  training  device  effectiveness 
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THE  PREDICTION  OF  TRAINING  DEVICE  EFFECTIVENESS: 
A  REVIEW  OF  ARMY  MODELS 


EXECUTIVE  SUMMARY 


Requirement : 

To  review  the  analytic  models  and  methods  developed  by  the  Army  for  the 
prediction  of  training  device  effectiveness:  and  to  recommend  procedures  for 
the  development,  validation  and  application  of  improved  models. 


Procedure: 

Four  predictive  models,  known  collectively  as  TRAINVICE,  were  compared  in 
terms  of  their  implicit  assumptions,  analytic  procedures,  validity,  and  utility 
for  training-device  acquisition. 


Findings • 

Despite  their  common  purpose,  the  four  TRAINVICE  models  differ  consider¬ 
ably  in:  the  task,  equipment,  and  personnel  variables;  and  the  mathematical 
formulae  used  to  calculate  training  effectiveness  indices.  The  major  limita¬ 
tion  shared  by  all  of  the  TRAINVICE  models  is  that  they  yield  overall  indices 
of  effectiveness.  The  utility  of  such  an  index  is  strongly  questioned.  The 
recommendation  was  made  that  a  model  be  developed  which  would  permit  a  more 
detailed  assessment  of  training  device  effectiveness.  Ideally,  such  a  model 
would  generate  effectiveness  indices  for  individual  skills,  and  would  provide 
procedures  for  aggregating  the  skill  indices  into  separate  task  indices.  It 
was  concluded  that  separate  skill  and  task  indices  would  yield  effectiveness 
predictions  of  sufficient  detail  to  be  of  use  to  the  training  developer  in 
the  design,  evaluation,  and  implementation  of  training  devices. 

Model  application  and  development  will  require  research  to  be  done  in 
two  areas:  field  validation  of  the  TRAINVICE  models  in  various  task  domains; 
and  longer  range  investigation  of  the  models'  underlying  assumptions.  The 
latter  area  should  include  a  refinement  of  the  learning  guidelines  contained 
in  the  models,  and  specification  of  behavioral  criteria  which  are  suitable 
to  analytic  as  well  as  empirical  evaluation  of  training  device  effectiveness. 


Utilization  of  Findings: 

The  review  and  recommendations  will  be  of  use  to  the  training  developer 
wishing  to  use  one  of  the  existing  TRAINVICE  models  as  well  as  to  the  model 
developer  trying  to  improve  the  prediction  of  training  effectiveness. 
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Introduction 

This  report  examines  analytic  methods  and  models  for  the  evaluation  of 
training  device  effectiveness.  The  need  for  such  non-empirical  evaluation 
procedures  has  been  a  persistent  concern  of  military  training  developers  since 
the  1  960* s.  In  particular,  the  Army  has  recently  developed  a  series  of 
models,  known  as  TRAINVICE,  which  attempt  to  predict  the  degree  to  which 
training  on  a  particular  training  device  will  transfer  to  performance  on  oper¬ 
ational  equipment.  These  models,  which  have  evolved  from  a  history  of  mili¬ 
tary  training  research,  are  the  principle  focus  of  this  paper. 

Jeantheau  (1971)  reported  an  attempt  by  the  Navy  at  the  "qualitative 
assessment"  of  training  device  effectiveness.  The  forms  and  guidance  included 
in  this  document  permit  the  cataloging  of  training  device  features  and  expert 
opinions  on  those  features.  These  procedures  do  not,  however,  result  directly 
in  the  evaluation  of  a  particular  training  device.  Rather,  the  method  simply 
provides  a  format  for  collecting  and  using  information  on  training  devices. 

In  a  later  effort,  done  for  the  Army,  Caro  (1970)  developed  the  Task 
Commonality  Analysis  (TCA)  method  for  the  prediction  of  transfer  of  training 
from  a  device  to  operational  equipment.  The  predictions  were  based  on  Realism 
ratings'  of  the  stimulus  (display)  and  response  (control)  properties  of  the 
training  device.  In  deducing  which  tasks  would  be  trained  well  (i.e.,  high 
transfer)  and  which  would  not,  Caro  adhered  to  Osgood’s  principles  of  trans¬ 
fer.  He  assumed  that  if  both  the  stimuli  and  responses  in  the  training  situ¬ 
ation  were  similar  to  those  in  the  operational  situation,  then  positive 
transfer  would  result.  Further,  he  assumed  that  if  the  stimuli  were  similar, 
but  the  associated  responses  were  different,  then  negative  transfer  would 
occur.  Caro's  choice  is  not  surprising  since  these  assumptions  are  ubiquitous 
in  the  field  of  training  evaluation  and  are  well  represented  in  the  TRAINVICE 
models  which  are  discussed  below. 

Caro's  TCA  method  represented  the  state  of  the  art  when  it  was  pub¬ 
lished.  It  provided  the  impetus  and  much  of  the  groundwork  for  the  develop¬ 
ment  of  the  TRAINVICE  models.  Although  TCA  is  similar  to  the  TRAINVICE  models 
in  its  goal  and  in  some  of  its  assumptions,  it  will  not  be  treated  more  fully 
here  for  the  following  reasons.  The  realism  ratings  were  rudimentary  (see 
footnote  1)  and  were  not  based  on  clearly  articulated  criteria.  The  transfer 
predictions  consist  of  simple,  qualitative  statements  about  whether  or  not  a 
task  will  be  trained  well.  Furthermore,  the  judgmental  operations  required  to 
generate  the  predictions  have  not  been  reduced  to  a  formal  algorithm.  That 
is,  there  are  no  fixed  procedures  for  transforming  or  combining  data  to  arrive 
at  a  clear  prediction. 


Raters  were  simply  asked  to  Judge  whether  or  not  a  display  or  control 
was  "realistic".  The  realism  score  for  a  piece  of  equipment  was  the 
percentage  of  raters  who  said  that  that  piece  of  training  equipment  was 
"realistic". 
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It  was  not  until  1  97 6  that  the  shortcomings  of  earlier  approaches  were 
addressed.  Between  this  time  and  1 980,  the  Army  developed  a  family  of  predic¬ 
tive  models  known  collectively  as  TRAINVICE.  In  their  attempt  to  generate 
quantitative  predictions  of  effectiveness  through  formal  procedures  these  four 
models  represent  the  most  ambitious  steps  taken  to  date  in  the  field  of  anal¬ 
ytic  evaluation.  The  level  of  sophistication  and  the  potentially  great  util¬ 
ity  of  these  models  warrant  a  very  close  examination  of  the  procedures,  as¬ 
sumptions  and  validity  of  TRAINVICE. 

The  original  method,  developed  in  1976,  is  referred  to  as  TRAINVICE-A 
(TV-A)  in  this  report  (Wheaton,  Fingerman,  Rose,  and  Leonard,  1  976).  In  1  97  9, 
the  Honeywell  Corporation  modified  TV-A  as  part  of  an  effort  to  develop  de¬ 
tailed  guidance  for  user  application  (PM-TRADE,  1979).  This  modified  approach 
is  referred  to  as  TRAINVICE-B  (TV-B)  in  this  report.  Other  modifications  to 
TV-A  were  developed  by  the  U.S.  Army  Research  Institute  (Narva,  1  97  9a,  197Sb) 
and  are  reported  herein  as  TRAINVICE-C  (TV-C).  Finally,  in  an  effort  to 
develop  a  user  guidebook  for  applying  TV-C,  additional  revisions  were  made. 
(Swezey  and  Evans,  I960)  This  approach  is  referred  to  as  TRAINVICE-D  (TV-D) 
in  this  report. 

Although  each  model  purports  to  provide  an  index  of  effectiveness,  or 
transfer  of  training  potential  for  a  device,  these  models  differ  in  several 
important  ways.  For  example,  the  variables  considered  in  the  calculation  of 
the  indices  are  given  different  degrees  of  emphasis  or  mathematical  weight  in 
each  model.  The  procedures  used  to  estimate  the  values  for  each  component 
vary  considerably  from  model  to  model.  Moreover,  the  procedures  used  to 
calculate  an  index  of  effectiveness  from  the  variable  values  are  also  very 
different  in  each  model. 

The  TRAINVICE  models  do,  however,  share  a  common  data  collection 
method.  This  method  consists  of  a  structured  interrogation  of  a  subject 
matter  expert.  As  such  the  models  place  a  very  high  premium  on  the  judgment 
of  an  expert.  The  method  focuses  decision-making  on  a  specific  set  of  issues 
for  each  task  or  part  of  a  task.  In  the  first  of  the  TRAINVICE  models,  for 
example,  one  of  the  issues  considered  is  the  similarity  between  the  equipment 
on  a  training  device  and  that  on  the  operational  equipment  to  perform  a  par¬ 
ticular  subtask.  This  issue  is  further  delineated  into  physical  similarity 
(appearance,  location,  etc.)  and  functional  similarity  (amount  of  information 
flow  between  the  human  operator  and  the  controls  and  displays).  For  each  of 
these,  (i.e.  physical  and  functional  similarities)  the  expert  assigns  values 
from  a  rating  scale  which  ranges  from  J)  to  _3.  Guidance  is  provided  by  a  de¬ 
scription  of  the  criteria  associated  with  each  value,  (e.g.,  a  "jj"  means 
identical  to  operational  equipment).  This  procedure  continues  until  all 
equipment  (i.e.  displays/controls)  associated  with  all  sub-tasks  have  been 
rated.  An  analogous  rating  procedure  is  performed  for  all  variables  in  the 
model  pertaining  to  each  subtask.  In  this  manner,  the  subject  matter  expert 
can  estimate  numerical  values  for  each  predictor  variable  considered  by  the 
model  (e.g.,  similarity,  training  techniques,  task  difficulty,  etc.).  These 
estimated  values  for  the  variables  are  then  entered  into  a  general  formula 
which  results  in  an  overall  figure  of  merit  (index  of  training  effectiveness) 
for  the  training  device  in  question. 

The  judgments  of  the  subject  matter  expert  and  the  index  of  effectiveness 
rely  on  many  assumptions,  both  theoretical  and  mathematical  in  nature.  The 


theoretical  assumptions  include:  a)  what  is  being  predicted  (e.g.,  a  partic¬ 
ular  measure  of  transfer  of  training);  and  b)  which  task  and  equipment  vari¬ 
ables  have  the  predictive  power  to  generate  such  a  measure  of  effectiveness. 
The  mathematical  assumptions  concern:  a)  the  manner  in  which  all  the  values 
are  combined  (e.g.,  weighting  strategies,  etc);  and  b)  the  numerical  proper¬ 
ties  of  the  rating  scales  used  to  estimate  those  values.  As  indicated 
earlier,  the  four  analytic  models  reviewed  in  this  report  differ  considerably 
in  the  assumptions  made  and  in  the  forms  in  which  the  assumptions  are  manifes¬ 
ted. 

Section  II  of  this  report  contains  a  detailed  description  of  each  model, 
taken  individually.  Section  III  is  a  general  summary  and  critique  of  all  four 
models,  in  which  differences  among  the  models  are  discussed  in  detail. 
Finally,  based  upon  the  results  of  the  critical  review,  future  directions  are 
discussed  in  Section  IV. 


TRAINVICE-A  (TV-A) 
Overview 


II 

Models 


The  Wheaton,  Fingerman,  Rose,  and  Leonard  (1976)  approach,  TV-A,  is  an 
attempt  to  predict  and  evaluate  training  device  effectiveness,  specifically 
transfer  of  skills  from  training  to  operational  settings,  by  combining  judg¬ 
ments  about  a  variety  of  factors.  Judgments  are  transformed  into  values  re¬ 
lated  to  the  interactions  among  device  design  and  use,  trainee  ability,  and 
training  strategy.  Effectiveness,  therefore,  is  assumed  to  be  a  function  of 
the: 

1 .  Transfer  Potential:  potential  for  transfer  of  training  using  a 
particular  device  which  is  determined  by  the: 

a.  overlap  or  communality  of  the  skills  taught  on  a  device  and  those 
necessary  to  perform  on  the  operational  equipment,  and 

b.  physical  and  functional  similarity  between  a  device  and  the  opera¬ 
tional  equipment 

2.  Learning  Deficit:  differences  (i.e.,  deficits)  between  a  trainee's 
knowledge  before  training  on  a  device  and  what  must  be  known  about  the  op¬ 
erational  equipment,  weighted  by  the  difficulty  of  acquiring  such  knowledge, 
and; 

5.  Training  Techniques:  appropriateness  of  training  techniques  or  device 
features  incorporated  into  a  device,  and  how  well  these  features  adhere  to 
accepted  principles  of  learning. 

The  Wheaton,  Fingerman,  Rose,  and  Leonard  (1976)  model  combines  values  of 
judgments  made  for  each  of  the  above  factors  into  an  index  reflecting  the 
effectiveness  of  a  device. 


Process 


Generating  forecasts  about  the  effectiveness  of  a  training  device,  ac¬ 
cording  to  Wheaton,  Fingerman,  Rose,  and  Leonard  (1976)  requires  analyses  of 
the  components:  Transfer  Potential,  Learning  Deficit,  and  Training  Tech¬ 
niques  for  a  training  device.  These  components  are  subsumed  under  one  of 
the  three  categories  in  the  structural  and  functional  model  of  this  training 
device  effectiveness  approach.  The  process  of  analyzing  each  of  the  compo¬ 
nents  (i.e..  Transfer  Potential,  Learning  Deficit,  Training  Technique)  re¬ 
quires  judgments2  to  be  made  for  five  basic  analyses  of: 
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See  Appendix  for  the  rating  scales. 


*  * : * 


t 

1 . 

Task  Communality 

2. 

Physical  Similarity 

3. 

Functional  Similarity 

r  ■MMi 

4. 

Learning  Deficit 

5. 

Training  Technique 

V 

Values 

for  these  analyses  are 

jrived  for  a  device  under  evaluation,  then 
where  appropriate,  compared  to  the  operational  equipment  for  which  a  device 
was  developed. 


Inputs 

Before  performing  the  procedures  by  which  values  of  the  TV-A  variables 
are  estimated,  a  user  needs  a  list  of  training  objectives  and  relevant  task 
analytic  information  for  both  an  operational  setting  and  a  training  device 
being  evaluated.  TV-A  requires  that  most  analyses  be  conducted  at  the  sub¬ 
task  level  and  some  analyses  at  the  level  of  the  skills  and  knowledges  which 
comprise  each  subtask.  A  Subtask,  according  to  Folley’s  (1964)  definition  is 
.  .an  activity  that  is  performed  by  one  person  and  bounded  by  two  events" 
(Wheaton,  Fingerman,  Rose,  and  Leonard,  1976,  p.  16).  The  value  of  each  TV-A 
variable  is  estimated  for  each  subtask  identified  in  the  operational  task 
analysis. 


Procedures 


Task  Communality  Analysis  (C) 

Task  Communality  Analysis  (C)  assesses  the  overlap  between  training  sub¬ 
tasks  and  those  in  the  operational  equipment.  The  value  of  C  is  determined 
by  comparing  operational  and  training  device  task  analyses  with  each  other. 
In  this  procedure,  a  training  device  is  given  a  rating  of  "J_"  for  each  "i" 
operational  subtask  it  covers  or  "0"  for  those  it  fails  to  cover.  Since  a 
value  of  "0"  decreases  the  sura  in  the  numerator  of  the  final  prediction 
formula,  the  task  communality  rating  serves  to  penalize  a  training  device  for 
each  operational  subtask  not  covered.  A  training  device  is  not,  however, 
penalized  for  including  subtasks  which  are  not  in  the  operational  environment 
(i.e.,  additional  subtasks). 

In  the  overall  device  effectiveness  prediction  formula,  the  sum  of  C 
values  for  a  device  is  compared  to  values  for  subtasks  on  the  operational 
equipment.  Since  this  comparison  is  made  against  the  operational  equipment, 
Cj[  always  =  "1_"  for  the  operational  equipment. 


Physical  Similarity  Analysis  (PSA) 

The  Physical  Similarity  Analysis  (PSA),  and  the  Functional  Similarity 
Analysis  (FSA),  discutsed  below,  combine  to  form  the  Similarity  (s)  component 
in  the  predictive  equation.  The  degree  of  similarity  (S^)  between  a  training 
device  and  the  operational  equipment  is  the  average  of  values  assigned  to  the 
fidelity  variables:  physical  and  functional  similarity.  The  PSA  allows  for 
judgments  concerning  the  physical  characteristics  (i.e.,  appearance,  size, 
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location,  etc.)  of  displays  and  controls  used  in  training  specific  behavioral 
performance  on  a  device.  The  Functional  Similarity  Analysis  is  concerned  with 
the  information  processing  activities  of  the  human  who  is  viewing  the  displays 
and  operating  the  controls. 

The  information  required  to  perform  the  PSA  is  a  list  of  all  displays  and 
controls  on  the  operational  and  training  equipment  relevant  to  each  subtask. 
The  displays  and  controls  corresponding  to  each  subtask  are  given  a  rating  by 
judging  how  well  the  operational  equipment  is  represented  in  a  training 
device:  from  (not  represented)  to  "_3"  (identical  to  operational  equip¬ 
ment).  These  ratings  are  averaged  across  controls  and  displays  and  divided  by 
"3"  to  yield  a  physical  similarity  index  ranging  between  "0"  and  "_1"* 

Like  the  C  analysis,  the  PSA  for  the  operational  equipment  is  "  1_" .  The 
rationale  is  that  the  operational  equipment  represents  the  maximum  degree  of 
physical  similarity.  A  PSA  value  of  "V' ,  therefore  is  assigned  to  all 
displays  and  controls  on  the  operational  equipment  corresponding  to  the 
subtasks. 


Functional  Similarity  Analysis  (FSA) 


Like  the  PSA,  performing  the  Functional  Similarity  Analysis  (FSA) 
requires  a  list  of  operational  subtasks  and  corresponding  displays  and 
controls.  A  flow  diagram  for  each  subtask  is  then  generated  indicating  the 
type,  amount,  and  direction  of  information  to  and  from  the  operator  for  each 
control  and  display.  The  amount  of  information  (in  "bits")  is  determined  by 
the  number  of  stimulus  (i.e.  information  transmitted  from  a  display  to  an 
operator)  and  response  (i.e.  information  transmitted  from  an  operator  to  a 
control)  states  which  displays  or  controls  can  assume.  The  remainder  of  this 


analysis  consists  of  rating  differences  between  the  amount  of  information  in 
an  operation  setting  (HqS)  and  that  in  a  training  setting  (HTS).^  For  each 
control  and  display,  a  training  device  is  given  a  rating:  from  "0^  (missing) 
to  "_3"  (identical:  Hqq  =  Hyg).  Ratings  for  controls  and  displays  are  then 
averaged  and  divided  by  n3y  to  give  a  functional  similarity  index  for  each 
subtask  which  ranges  between  "0"  and  "V*.  The  overall  similarity  index  for 


each  sub task  (S^)  is  the  average  of  the  Physical  and  Functional  Similarity 
Indices  (P  +  F). 
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The  FSA  for  each  display  and  control  on  operational  equipment,  similar  to 
other  analyses  discussed  thus  far,  is  always  a  "JL". 


Learning  Deficit  Analysis  (D) 


The  Learning  Deficit(D)  index,  for  every  subtask,  requires  each  skill  and 
knowledge  be  given  two  ratings  (rating  scales  adapted  from  Demaree,  1961;  see 


The  FSA  analysis  requires  the  rater  to  compare  amounts  of  information  in 
log2  units;  a  potentially  difficult  procedure  for  users  who  are  unfamil¬ 
iar  with  information  theory.  This  shortcoming  was  addressed  in  TV-B's 
revision  of  the  Functional  Similarity  Analysis. 
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Appendix).  The  first  rating,  repertory  scale  (RS),  assesses  the  degree  to 
which  trainees  are  already  proficient  in  the  skills  and  knowledges  to  be 
taught.  A  rating  from  "0"  (no  experience)  to  "4_"  (complete  understanding)  is 
assigned  to  each  skill  and  knowledge.  An  estimate  is  then  made  of  the  levels 
of  proficiency  required  of  a  trainee,  for  each  skill  and  knowledge,  in  order 
to  perform  a  particular  subtask  to  criterion.  Accordingly,  a  criterion  scale 
(CS)  value  is  assigned  to  each  skill  and  knowledge:  from  "0”  (no  experience) 
to  "4"  (complete  understanding).  The  criterion  scale  value  minus  the  reper¬ 
tory  scale  value  (CS-RS),  then  represents  the  learning  deficit  for  each  skill 
and  knowledge.  The  learning  deficit  index  (LD)  for  each  3ubtask  is  simply 
the  average  of  the  learning  deficit  values  of  all  skills  and  knowledges 
involved: 

S+K 

LD  =  z  CSi  -  RSi 

i=1 _ 

#  skills  and  knowledges 

LD  ranges  between  "0”  and  "4". 

The  Learning  Deficit  value  for  each  subtask  is  then  weighted  by  the 
difficulty  of  training  the  skills  and  knowledges  necessary  for  that  subtask 
(i.e.,  how  hard  it  is  to  overcome  the  learning  deficit).  To  do  this,  each 
subtask  is  ranked  according  to  the  amount  of  time  required  to  train  that 
subtask  on  the  operational  equipment  (a  rank  of  "_1_"  for  the  easiest  subtask; 
higher  ranks  for  subtasks  requiring  more  training  time).  The  learning  defi¬ 
cit  value  for  each  subtask  is  multiplied  by  its  rank,  then  divided  by  ”4" 
times  the  total  number  of  subtask3.  This  procedure  yields  a  weighted  learn¬ 
ing  deficit  value  (Dj_)  for  each  subtask  which  ranges  between  "0"  and  ”J_" .  A 
I)  value  is  computed  once  for  the  operational  subtasks  because  these  values 
are  applicable  to  both  a  training  device  and  operational  equipment. 


Training  Techniques  Analysis  (t) 

In  the  Training  Techniques  (t)  analysis  a  training  device  is  rated  on 
how  well  it  implements  established  learning  principles.  The  first  step  is  to 
assign  one  or  more  task  taxonomic  labels,  (after  US  Naval  Training  Device 
Center,  1972),  to  each  operational  subtask,  using  the  skills  and  knowledges 
comprising  each  subtask.  Associated  with  each  of  the  thirteen  task  catego¬ 
ries  in  the  taxonomy  are  three  sets  of  learning  principles  which  are  related 
to  stimulus,  response,  and  feedback  aspects  of  these  tasks  (after  Willis  and 
Peterson,  1961;  and  Micheli,  1972).  For  each  subtask,  ratings  are  given  on 
how  well  a  training  device  implements  each  of  the  relevant  learning  princi- 
ples:  "-3"  (complete  violation  of  principle);  "0"  principle  not  implemented 
or  violated);  "3"  (optimal  implementation  of  principle).  The  lowest  ratings 
given  to  learning  principles  in  each  category  (i.e.,  stimulus,  response,  and 
feedback)  are  then  averaged,  to  yield  a  T  score  for  each  subtask.  In  order 
to  scale  £  down  to  between  "0"  and  "3"  is  added  to  the  averaged  score, 
and  the  sum  is  divided  by  "6". 

As  Wheaton,  Fingerman,  Rose,  and  Leonard  (1976)  pointed  out,  the  deter¬ 
mination  of  T  values  is  rather  conservative  since  only  the  poorest  implemen¬ 
tation  of  training  techniques  on  a  device  is  considered.  In  the  TV-A  proce¬ 
dure,  a  training  device  does  not  get  credit  for  having  a  few,  especially  good 
instructional  features. 
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The  operational  equipment  is  assumed  to  make  optimal  use  of  training 
techniques,  therefore,  =  1 . 


Outputs 


Indices 


Each  of  the  TV-A  analyses  can  be  calculated  and  collapsed  across  sub¬ 
tasks  to  derive  a  separate  index  purporting  to  assess  Transfer  Potential, 
Learning  Deficit  or  Training  Techniques.  Calculation  of  these  indices  may 
serve  as  a  diagnostic  function  to  locate  deficiencies  or  assets  in  a  training 
device.  For  a  detailed  discussion  on  these  indices,  the  reader  i3  referred 
to  Research  Memorandum  76-16  (Wheaton,  Fingerman,  Rose,  and  Leonard,  1976). 
Such  a  presentation  is  beyond  the  scope  of  this  effort. 


Overall  Device  Effectiveness  Prediction 


The  developers  of  the  TV-A  model  tried  to  predict  the  Gagne,  Foster,  and 
Crowley  (1948)  measure  of  transfer: 

C  -  E 

T  *  - 

c 


In  this  classic  transfer  of  training  paradigm,  both  £  and  E  are  measures  of 
practice  (time,  trials,  errors)  required  on  operational  equipment,  in  order 
to  meet  a  performance  criterion.  £  represents  a  control  group,  which  prac¬ 
ticed  only  on  operational  equipment.  E  represents  an  experimental  group, 
which  practiced  on  a  simulator  or  training  device  first,  then  transferred  to 
operational  equipment.  The  question  which  this  transfer  equation  attempts  to 
answer  is:  How  much  training  time  (i.e.,  on  operational  equipment)  can  be 
saved  by  providing  practice  on  a  simulator? 


t  is,  therefore,  a  measure  of  savings.  It  equals  the  amount  of  training 
time  on  operational  equipment  saved  by  practicing  on  a  simulator  first  (C-E), 
as  a  proportion  of  training  time  required  when  operational  equipment  alone  is 
used  (c).  t  varies  between  -«  and  +1.  In  theory,  the  closer  t  is  to  +1,  the 
greater  the  transfer  of  skills  acquired  with  a  simulator  to  operational 
equipment. 

Wheaton,  Fingerman,  Rose,  and  Leonard  (1976)  attempted  to  predict  t  di¬ 
rectly  by  estimating  values  for  £  and  E,  and  substituting  these  into  the 
original  transfer  equation.  In  order  to  do  this,  it  was  assumed  that  train¬ 
ing  time  (regardless  of  experimental  conditions)  is  a  function  of:  (l)  how 
well  a  training  setting  represents  the  operational  (real  world)  situation, 
both  in  terms  of  tasks  covered  in  training  and  fidelity  of  the  training  set¬ 
ting;  (2)  the  difficulty  inherent  in  the  tasks  which  must  be  learned  to  some 
criterion;  and  (3)  the  appropriateness  (or  value)  of  the  instructional  tech¬ 
niques  used  to  train  the  tasks.  The  first  factor  is  represented  in  TV-A  by 
two  variables,  a  coverage  variable,  £  (task  communality) ,  and  a  similarity 
variable,  £3  (physical  and  functional  similarity).  The  second  is  represented 
by  the  learning  difficulty  variable,  J),  and  the  third  by  the  training  vari¬ 
able,  T^  As  Wheaton,  Fingerman,  Rose,  and  Leonard  (1976)  stated,  "The  time, 
trials,  or  errors  to  a  criterion  on  subtask  i^  is  assumed  to  be  a  linear 
function  of  C*  x  S*  x  D^  x  T^"  (p.  48). 


v.v.w;-:. 


Since  the  training  setting  for  a  control  group,  however,  is  the  oper¬ 
ational  equipment,  it  is  clear  that  all  operational  subtasks  are  covered  by 
the  equipment  (C^  =  l),  and  the  physical  and  functional  similarity  is 
identical  for  each  subtask  (Sj^  =  l).  It  is  also  a33umed  in  a  TV-A  appli¬ 

cation  that,  when  training  takes  place  on  operational  equipment,  the  instruc¬ 
tional  techniques  used  are  optimal  (T^  =  l).  These  assumptions  mean  that  the 
amount  of  practice  required  by  a  control  group  (C)  is  determined  solely  by 
the  difficulty  of  each  subtask  (D^)  summed  over  all  subtasks: 

N 

l  D, 
i=  1  1 

In  order  to  estimate  E!  for  the  experimental  group,  the  amount  learned  on 
a  training  device  must  be  subtracted  from  the  amount  learned  on  operational 
equipment.  Since  a  training  device  is  assumed  not  to  be  identical  to  the  op¬ 
erational  equipment,  the  values  of  the  coverage  (C^)  and  similarity  (S^)  var¬ 
iables  will  not  always  be  "J_",  and  must  be  estimated  by  the  procedures  just 
discussed.  Likewise,  the  training  techniques  employed  to  teach  each  subtask 
are  assumed  to  be  less  than  optimal  when  a  training  device  is  used,  T^  must 
also  be  estimated.  The  amount  learned  on  operational  equipment  is  N 

ih  Di’ 

Therefore  E  is  assumed  to  be  equivalent  to:  N  N 

E  -  E  Ci  x  Si  x  Di  x  Ti. 
i=l  i=*l 

Given  these  estimated  values  of  £  and  E,  the  predicted  value  of  f  is 
calculated  by  the  equation:  H 

E  x  Sj  x  D*  x  T* 

f  -  1=1  1  1  1 


Summary 

The  Wheaton,  Fingerraan,  Rose,  and  Leonard  (1976)  model  purports  to  gen¬ 
erate  a  prediction  of  transfer  of  training  potential  for  training  devices 
based  on  an  analysis  of  both  operational  and  training  equipment.  The  model 
aggregates  values  for  a  series  of  factors  assumed  to  be  related  to  a  device's 
effectiveness.  The  factors  identified  are  task  communality,  similarity, 
learning  deficits  of  the  trainees,  difficulty  of  each  task  to  be  trained  and 
the  training  techniques  incorporated  into  a  device.  The  final  evaluation 
index  or  figure  of  merit  is  a  value  ranging  from  "0"  to  "1 .0",  with  values 
approaching  "T*  indicating  greater  transfer  potential  and,  therefore,  greater 
effectiveness. 

In  reviewing  the  TV-A  model  it  is  important  to  note  that  the  theoretical 
assumptions  and  specific  methodology  were  based  on  previous  efforts  (e.g., 
Wheaton,  Rose,  Fingerman,  Korotkin,  and  Holding,  1974.  1976).  Some  of  these 
assumptions  may  be  questioned,  and  one  might  consider  some  elements  missing. 
TV-A,  however,  represents  one  of  the  most  systematic  and  complete  methods  for 
assessing  device  effectiveness.  In  fact,  Wheaton,  Fingerman,  Rose,  and 
Leonard  (1976)  have  themselves  begun  a  critical  assessment  and  have  suggested 
directions  for  future  efforts.  For  example,  they  recommended  consideration 


of  some  important  external  variables.  These  include  the  amount  of  training 
and  practice  provided  and  user  acceptance  of  a  device.  While  these  consid¬ 
erations  are  external  to  a  device,  they  represent  variables  which  can  influ¬ 
ence  device  effectiveness. 

An  additional  device  related  variable  that  may  be  considered  for  inclu¬ 
sion  in  a  model  is  what  Wheaton,  Fingerman,  Rose,  and  Leonard  (1976)  call  En¬ 
vironmental  Fidelity  Analysis  (EFA).  The  EFA  would  potentially  account  for 
special  or  adverse  conditions  which  may  affect  performance.  These  conditions 
may  include  extreme  temperature,  reduced  visibility,  etc.  (Wheaton,  Finger- 
man,  Rose,  and  Leonard,  1976).  It  might  be  possible  to  obtain  judgments  or 
estimates  of  degradation  of  performance,  probability  of  occurrence  and  sub¬ 
tasks  affected  by  such  conditions.  A  future  model  may,  for  example,  include 
an  assessment  or  estimate  of  how  well  a  device  prepares  for  3uch  contingen¬ 
cies.  The  difficulty,  of  course,  is  that  a  device  developer  may  not  be  able 
to  replicate  such  conditions,  assuming  they  are  known,  and  a  researcher  may 
not  be  knowledgeable  about  human  behavior  under  the  3ame  circumstances. 

A  future  revision  of  TV-A  might  include  a  less  laborious  approach  to  the 
Training  Techniques  Analysis.  Wheaton,  Fingerman,  Rose,  and  Leonard  (1976) 
suggested  that  perhaps  this  analysis  could  be  conducted  at  the  subtask  rather 
than  the  skill/knowledge  level.  This  possibility  becomes  more  realistic  when 
considering  recent  evidence  of  the  utility  of  such  approaches  as  cluster 
analysis  in  ranking  job  related  tasks  (see  Boldovici,  Boycan,  Fingerman,  and 
Wheaton,  1979;  Wheaton,  Fingerman,  and  Boycan,  1978).  From  such  analyses, 
it  should  be  possible  to  form  clusters  of  skills/knowledges  or  subtasks  which 
can  generalize  to  entire  tasks.  Applications  of  a  T^  analysis,  therefore, 
*.  uld  be  conducted  on  a  restricted  number  of  subtasks,  and  thus  make  analysis 
easier. 

The  evaluation  methodology  presented  in  this  section  remains  to  be 
validated  both  in  terms  of  predictive  ability  and  the  constructs  within  the 
method.  As  one  reviews  the  literature  in  this  area,  this  criticism  applies 
to  other  revisions  of  TV-A  as  well  as  to  alternative  approaches.  It  has  be¬ 
come  apparent,  and  will  h e  discussed  in  the  last  section  of  this  paper,  that 
evaluations  of  the  varic  :  approaches  have  been  long  overdue  and  represent  a 
situation  that  must  be  remedied. 
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The  methodology  discussed  thus  far  is  based  on  a  variety  of  assumptions; 
some  of  which  are  accepted  while  others  may  require  further  justification. 
TV-A  assumes  a  linear  relationship,  for  example,  between  the  component  vari¬ 
ables  and  transfer  of  training  potential  of  a  device.  This  assumption  is 
presently  accepted  particularly  in  the  absence  of  any  compelling  reason  to 
do  otherwise.  Another  assumption  made  is  that  equipment  similarity  (i.e., 
fidelity)  is  monotonically  related  to  transfer,  and,  therefore,  a  valid  pre¬ 
dictor  variable.  This  is  also  related  to  the  assumption  that  operational 
equipment  represents  an  optimal  training  setting  against  which  a  device  may 
be  compared.  There  is  presently  no  evidence  to  support  these  notions.  An 
opposing  perspective  may  assume  that  training  devices  are  typically  built 
with  instructional  features  which  are  not  present  when  operational  equipment 
is  used  for  instruction.  In  addition,  training  devices  can  be  built  to  simu¬ 
late  the  range  of  conditions  a  trainee  may  encounter  on  the  job;  this  may  not 
be  possible  when  using  operational  equipment. 
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Criticisms  which  are  not  unique  to  the  TV-A  model  include  the  detailed 
input  requirements  (e.g.,  task  analytic  data)  and  the  premise  that  device 
effectiveness  seems  limited  to  transfer  of  training.  In  addition,  all  ap¬ 
proaches  reviewed  for  this  paper  mathematically  combine  a  number  of  vari¬ 
ables  into  a  final,  overall  index.  These  criticisms  remain  unresolved  and 
must  be  addressed  in  the  near  future. 

TRAINVICE-B  (TV-B) 

Overview 

The  TRAINVICE-B  (TV-B)  model  assumes  that  a  device  is  the  appropriate 
medium  for  training  based  on  the  media  selection  decision  procedures  speci¬ 
fied  in  the  Training  Device  Requirement  Documents  Guide  (1979)*  Within  the 
media  selection  decision  procedures,  a  training  developer  previously  analyzed 
and  organized  tasks,  skills  and  knowledges,  and  objective  data  formulating  a 
training  device  concept.  The  TV-B  approach  is  purported  to  insure  that  es¬ 
tablished  training  requirements,  incorporated  into  a  device,  were  emphasized. 

TV-B  provides  an  approach  to  analyze  and  evaluate  the  effectiveness  of  a 
training  device,  typically  in  comparison  to  alternative  device  concepts  or 
already  existing  devices.  Applying  the  TV-B  approach  results  in  an  effec¬ 
tiveness  score  for  each  alternative  device  concept,  which  is  then  used  to 
decide  which  concept  should  be  developed  further. 

The  TV-B  approach,  therefore,  is  embedded  in  a  series  of  administrative 
procedures  designed  to  establish  the  need  for  a  device,  determine  if  a  device 
which  potentially  may  serve  a  training  function  already  exists  and  to  evalu¬ 
ate  either  existing  devices  or  device  concepts  in  terms  of  effectiveness. 

For  example,  in  deciding  whether  a  device  i3  an  appropriate  training 
medium,  a  developer  would  have  already  collected  information  regarding  tasks, 
task  elements,  and  controls  and  displays.  The  application  of  TV-B,  in  ef¬ 
fect,  becomes  a  trade-off  analysis,  because  a  device  is  not  expected  to  meet 
all  task  training  requirements.  To  the  extent  that  a  device  does  not  address 
all  the  requirements,  a  developer  is  provided  with  a  methodology  to  assess 
alternative  concepts. 

The  TV-B  methodology  is  similar  to  the  TV-A  approach.  A  rating  of  the 
correspondence  between  the  operational  equipment  and  a  training  device  is 
combined  with  an  index  of  the  extent  of  training  required  and  ability  level 
of  the  trainees.  The  product  of  these  values  becomes  the  training  device  ef¬ 
fectiveness  index.  In  TV-B,  however,  when  an  existing  device  is  compared  to 
a  training  concept  or  requirement,  the  effectiveness  index  may  be  adjusted 
for  providing  additional  training  beyond  that  required.  The  assumption  is 
that  training  additional  skills  represents  unnecessary  costs  which  lead  to  a 
loss  of  effectiveness. 


Procedures 

The  TV-B  methodology  allows  values  to  be  assigned  to  components  which 
comprise  two  basic  subdivisions:  (l)  device  characteristics  and  (2)  personnel 


and  training  requirements.  These  sub-divisions  are  further  divided  into  the 
following  components: 


Device  Characteristics 


Personnel  & 

Training  Requirements 


o  Task  Commonality 
o  Physical  Similarity 
o  Functional  Similarity 

o  Skills  and  Knowledges  Requirements 

o  Task  Training  Difficulty 


Values  for  these  components  are  combined  to  form  an  index  of  training  device 
effectiveness. 


The  information  required  to  perform  a  TV-B  analysis  includes  the: 

1)  list  of  tasks  and  elements  (i.e.  sub- tasks)  to  be  trained 
(operational  tasks); 

2)  tasks  and  task  elements  which  can  be  trained  with  a  particular 
device.  This  list  includes  task  elements  covered  by  a  training 
device,  which  are  not  contained  in  the  training  objectives  (i.e. 
unique  elements); 

3)  skills  and  knowledges  required  to  meet  the  training  objectives; 

4)  controls  and  displays  used  to  perform  the  tasks  in  the 
operational  setting;  and 

5)  controls  and  displays  in  the  training  device. 


Task  Commonality  Analysis  (TC) 

The  Task  Commonality  (TC)  analysis  in  TV-B  is  different  from  the  C  analy¬ 
sis  in  TV-A.  In  TV-A,  each  subtask  is  given  a"J_"  or  a  "0"  depending  on 
whether  it  was  covered  by  a  training  device.  In  TV-B,  a  TC  value  is  deter¬ 
mined  for  each  task,  by  rating  whether  or  not  task  elements  which  require 
training  are  covered  on  a  device  ("_1"  covered,  "j0":  not  covered).  The  TC 
value  for  a  task  is  calculated  by  adding  all  task  element  ratings,  and 
dividing  this  sum  by  a  combination  of  the  total  number  of  required  task 
elements  plus  the  number  of  task  elements  which  do  not  require  training  but 
are  still  covered  by  the  training  device  (i.e.  unique  elements). 


Physical  Similarity  Analysis  (PS) 

In  the  Physical  Similarity  (PS)  analysis  the  controls  and  displays  on  a 
training  device  and  on  the  operational  equipment  are  compared  in  terms  of 
their  appearance,  size,  location,  etc.  The  comparison  is  made  only  for  device 
characteristics  which  are  directly  involved  in  performing  those  task  elements 
which  require  training.  Each  control  or  display  on  a  training  device  is  rated 
on  the  degree  of  physical  similarity  (i.e.,  fidelity)  between  it  and  the 
corresponding  control  or  display  on  the  operational  equipment.  The  rating 
scale,  used  for  this  purpose,  ranges  from  "0"  (missing)  to  "3"  (identical). 
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The  scale  values  and  criteria  for  judgments  are  very  similar  to  those  in  TV- 
A.  There  are,  however,  changes  in  phrasing;  e.g.,  "small  noticeable  differ¬ 
ences"  in  place  of  the  more  traditional  and,  perhaps,  technical  "just  notice¬ 
able  differences"  (see  Appendix). 

In  order  to  derive  a  Physical  Similarity  index  for  each  task,  the  ratings 
given  to  controls  and  displays  on  a  device  are  totalled.  This  sum  is  then 
divided  by  a  combination  of  "2"  times  the  total  number  of  required  controls 
and  displays  plus  the  number  of  "unique"  controls  and  displays.  The  unique 
pieces  of  equipment  on  a  device  are  those  used  for  task  elements  or  skills 
which  are  associated  with  the  task  in  question,  but  do  not  require  training. 
Thus,  the  resulting  index  varies  between  "0/'  and  "J_",  representing  the 
physical  similarity  adjusted  for  extra  or  "unique"  equipment. 

Functional  Similarity  Analysis  (FS) 

The  Functional  Similarity  (FS)  analysis  in  TV-B,  like  that  in  TV-A 
compares  the  controls  and  displays  of  a  training  device  to  those  in  the 
operational  equipment  in  terms  of  amount  of  information  conveyed  from  or  to 
the  human  operator.  Just  as  in  the  PS  analysis,  each  of  the  "required"  con¬ 
trols  or  displays  relevant  to  a  particular  task  receives  a  rating  from  "0"  to 
"2".  The  rating  scale  used,  though  similar  to  that  in  TV-A,  includes  less 
technical  language.  A  "2"  on  the  TV-A  scale,  for  example,  means  that  the 
amount  of  information  in  the  operational  and  training  settings  are  "within  one 
logp  unit  of  each  other."  The  corresponding  description  in  TV-B  i3  "the 
numBer  of  states  in  the  training  situation  is  less  than  half  of  the  number  of 
states  in  the  operational  setting."  The  only  time  the  two  scales  are  equiva¬ 
lent,  is  when  there  is  less  information  in  a  training  setting.  The  logp  in 
TV-A  can  also  mean  greater  information  in  a  training  setting.  This  distinc¬ 
tion,  however,  is  consistent  with  the  TV-B  approach  in  adjusting  for  unique 
skills. 

In  order  to  calculate  the  functional  similiarity  index  for  each  task,  the 
ratings  given  to  all  controls  and  displays  on  a  device  are  summed  and  the 
total  is  divided  by  the  number  of  required  controls  and  displays  plus  the 
unique  ones.  This  results  in  an  index  ranging  from  "0"  to  "2".  The  last 
operation,  (i.e.  the  inclusion  of  the  unique  displays  and  controls  in  the 
denominator)  is  the  cost  adjustment  for  extra  training  device  features. 

Skills  and  Knowledges  Requirements  Analysis  (SKR) 


In  TV-B,  there  are  two  separate  preparatory  analyses  which  correspond  to 
the  Learning  Deficit  Analysis  in  TV-A.  In  TV-A,  the  Learning  Deficit  variable 
represents  an  estimate  of  how  much  the  trainees  have  to  learn,  weighted  by  the 
amount  of  time  it  takes  to  train  them  to  overcome  a  deficit,  on  the 
operational  equipment.  The  procedures  involved  in  both  TV-B  and  TV-A  are 
performed  independently  of  the  characteristics  of  the  training  device  under 
evaluation. 

In  Skills  and  Knowledges  Requirements  Analysis  (SKR),  each  skill  or 
knowledge  required  to  perform  a  task  receives  two  ratings.  The  first  rates 
the  level  of  proficiency  trainees  have  before  training.  The  second  rates 
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the  proficiency  level  required  after  training.  The  rating  scales  used  range 
from  "0"  (no  experience)  to  "4/'  (complete  understanding)  and  are  almost 
identical  to  the  Repertory  and  Criterion  Scales  used  in  TV-A  (see  Appendix.) 
The  difference  in  before  and  after  proficiency  levels  is  determined  for  each 
skill  or  knowledge  by  subtraction.  For  each  task,  a  SKR  index  is  calculated 
by  taking  the  mean  of  the  proficiency  differences  and  scaling  it  down  to 
between  "()"  and  by  dividing  by  4. 


Task  Training  Difficulty  Analysis  (TTD) 

The  TTD  is  quite  different  from  the  corresponding  procedures  in  TV-A. 
The  first  step  in  this  analysis  is  to  determine  how  much  time  would  be 
required  to  train  the  most  difficult  task  element  of  all  those  in  the  training 
objectives  (i.e.,  across  all  tasks).  Training  time  here  means  time  to  train 
on  the  operational  equipment.  A  TTD  index  is  derived  for  each  task  by  rating 
each  required  task  element  on  how  much  time  is  needed  to  train  it  on  the 
operational  equipment,  relative  to  the  training  time  required  by  the  most 
difficult  task  element.  The  ratings  are  made  using  a  scale  which  ranges  from 
”0_”  (requires  no  training)  to  "jf"  (requires  as  much  time  to  train  as  the  most 
time  consuming  task  element)  (see  Appendix.)  The  index  given  each  task  is  the 
average  of  the  difficulty  ratings  given  each  task  element,  scaled  to  between 
"0"  and  "1". 


Index  of  Training  Device  Effectiveness 


The  analyses  just  presented  are  used  to  calculate  an  overall  index  of 
effectiveness  for  a  training  device  or  concept.  The  values  for  TC,  PS,  and  FS 
are  summed  and  divided  by  3-  This  value  represents  the  degree  of  correspon¬ 
dence  between  a  training  device  and  the  operational  equipment.  Next,  the  SKR 
and  TTD  values,  for  each  task,  are  added  and  divided  by  2.  This  value  repre¬ 
sents  the  amount  of  training  required.  In  order  to  calculate  the  Index  of 
Training  Device  Effectiveness,  the  value  representing  the  degree  of  corres¬ 
pondence  and  the  amount  of  training  required  are  multiplied  for  each  task. 
These  products  are  then  summed  with  the  final  index  obtained  by  dividing  by 
the  amount  of  training  required  (i.e.  SKR  +  TTD). 

2 


The  final  index  formula  is: 


M 


TC  +  PS  +  FS 


SKR  +  TTD 


) 


-n  /  ; 

i-Ei  (J 


SKR  +  TTD 


The  TV-B  model  attempts  to  adjust  the  final  index  by  a  correction  factor 
which  reflects  a  loss  of  effectiveness  due  to  unnecessary  cost.  This 
adjustment  factor  is  calculated  as: 

_ #  of  Required  Tasks _ 

of  Required  Tasks)  +  (#  of  Tasks  Unique) 


This  factor  accounts  for  capabilities  in  a  device  that  are  not  required, 
adjustment  factor  is  applied  by  multiplying  it  to  the  final  index. 
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adjustment  is  assumed  not  to  be  required  when  assessing  theoretical  device 
concepts,  only  existing  devices. 


Summary 


TV-B  is  similar  to  TV-A  in  terms  of  many  of  the  components  which  enter 
into  the  overall  training  device  index.  Two  major  subdivisions  comprise  the 
Honeywell  approach.  These  include  measures  that  assess  the  degree  of  cor¬ 
respondence  between  a  device  and  operational  equipment  for  which  it  was  de¬ 
veloped.  The  degree  of  correspondence  assessment  is  similar  to  TV-A  in  that 
Task  Commonality,  Physical  and  Functional  Similarity  are  determined.  These 
values  are  later  combined  with  an  index  of  the  amount  of  training  required 
for  a  set  of  tasks;  again,  similar  to  TV-A. 


A  major  distinction  between  TV-A  and  TV-B  is  that  TV-B  does  not  include 
an  assessment  of  the  training  techniques  incorporated  in  a  device.  That  is, 
there  is  no  measure  of  the  appropriateness  of  the  instructional  features  in 
relation  to  accepted  learning  principles.  Another  difference  i3  that  while 
TV-A  adjusts  the  overall  effectiveness  index  for  failing  to  cover  tasks  on  a 
device,  TV-B  additionally  penalizes  a  device  for  including  additional  in¬ 
structional  features  beyond  those  required.  The  rationale  of  this  latter  ad¬ 
justment  is  the  assumption  that  a  decrease  in  training  effectiveness  results 
when  unique  or  unnecessary  skills  are  taught.  The  rationale  continues  into 
cost  considerations  as  well.  That  is,  additional  training  in  non-required 
skills  costs  more,  and  therefore  is  undesirable.  These  assumptions  and  re¬ 
lated  adjustments  may  be  suspect  and  unwarranted.  Without  an  assessment  of 
adherence  to  accepted  instructional  or  learning  guidelines  there  appears  to 
be  little  basis  for  such  a  penalization.  In  fact,  there  may  be  instances 
where  additional  skills,  beyond  those  required,  may  enhance  overall  transfer 
of  training  and  this  may  go  completely  unrecognized  by  an  evaluator. 


The  TV-B  approach,  however,  does  emphasize  the  relationship  of  effec¬ 
tiveness  with  cost  considerations  more  than  the  TV-A  model.  This  is  partic¬ 
ularly  relevant  when  the  objective  is  to  assess  the  total  long-term  training 
cost  in  relation  to  effectiveness  as  the  Guidebook  indicates.  Indeed 
rarely  does  device  development  proceed  without  cost  considerations  in  terms 
of  resources  required  for  facilities,  equipment,  instructional  material,  per¬ 
sonnel,  students,  supplies,  etc. 


Finally,  TV-B,  like  TV-A,  relies  on  a  number  of  assumptions  which  in¬ 
clude  linearity  and  method  of  mathematical  aggregation.  These,  along  with 
other  issues  pertaining  to  reliability  and  validity,  are  major  concerns  and 
will  be  discussed  further  in  a  later  section. 


TRAINVICE-C  (TV-C) 


Overview 

A  revised  version  of  the  TV-A  approach,  referred  to  in  this  paper  as 
TRAINVICE-C  (TV-C)  was  developed  to  increase  the  practicality  and  flexibility 
of  a  device  effectiveness  model  (Narva,  1979a;  1979b).  TV-C  attempted  to 
provide  a  means  for  answering  three  questions  about  a  training  device: 
"what",  "why",  and  "how". 

The  "what"  question  addresses  what  should  be  represented  in  a  device. 
Two  judgments  are  required  in  the  answer.  The  first  refers  to  the  require¬ 
ment  for  an  activity  to  be  incorporated  into  a  device.  The  second  refers  to 
whether  the  device  actually  covers  an  activity. 

The  "why"  question  tries  to  uncover  the  reasons  for  including  training 
activities  on  a  device.  The  two  stages  of  this  issue  include  training  criti¬ 
cality,  or  the  level  of  proficiency  required  at  the  conclusion  of  training, 
and  training  difficulty,  or  how  hard  it  is  for  a  trainee  to  reach  that  pro¬ 
ficiency  level. 

The  "how"  question  pertains  to  the  physical  and  functional  characteris¬ 
tics  of  a  training  device.  That  is,  TV-C  assesses  how  well  displays  and 
controls  (i.e.,  physical  characteristics)  follow  accepted  instructional  or 
training  guidelines,  and  the  trainer's  requirements.  In  addition,  the  "how" 
refers  to  the  extent  functions  of  displays  and  controls  (i.e.,  functional 
characteristics)  adhere  to  guidelines  on  instruction.  Judgments  are  made  for 
every  skill  or  knowledge  required  on  a  training  device,  with  values  corre¬ 
sponding  to  these  judgments  substituted  in  a  formula  designed  to  reflect  the 
percentage  of  maximum  transfer  which  would  be  fostered  by  use  of  a  particular 
training  device. 


Procedures 


|  Coverage  Requirements  Analysis  (CH) 

!  The  first  analysis  performed  in  TV-C  is  the  Coverage  Requirements 

|  Analysis  (CR).  The  procedure  consists  of  assigning  a  "1"  or  a  "0"  to  each 

.  skill  or  knowledge  (from  the  operational  task  analysis),  depending  on  whether 

I  or  not  it  should  be  covered  by  a  device.  In  other  words  this  analysis  serves 

to  determine  which  skills  and  knowledges  warrant  training.  This  screening 
process  already  existed  in  TV-A,  as  part  of  the  Learning  Deficit  Analysis  (a 
CS  rating  of  "0")(Narva,  1979a,  1  97  9b).  TV-C  simply  highlights  this  issue  for 
separate  and  initial  attention.  In  either  case,  however,  a  high  premium  is 
placed  on  the  judgment  of  a  training  analyst. 


Coverage  Analysis  (C) 

The  Coverage  Analysis  (C)  compares  the  skills  and  knowledges  in  the 
operational  setting  with  those  covered  by  a  training  device.  Just  as  in  the 
Task  Communality  for  TV-A  and  Commonality  Analyses  for  TV-B,  a  £  value  of  ”V' 
is  assigned  to  each  operational  skill  which  is  represented,  a  "0"  when  not 
represented  in  the  training  setting.  The  only  difference  between  the  TV-C 
approach  to  coverage  and  methods  used  in  earlier  versions  of  TRAINVICE  is  TV- 
C,  ratings  are  made  for  each  skill,  whereas  in  the  others  the  rating  is  made 
for  each  subtask. 


I  Training  Criticality  Analysis  (C^ 

All  skills  receiving  a  rating  of  "V'  in  both  of  the  preceding  analyses 
'  are  then  subjected  to  the  Training  Criticality  Analysis  (Cj.).  Each  skill  or 

knowledge  is  rated  on  the  degree  of  proficiency  which  will  be  required  after 
training  (not  mission  criticality).  The  scale  used  to  make  this  rating  is 
almost  identical  to  the  Criterion  Scale  used  in  the  Training  Deficit  Analysis 
for  TV-A  (See  Appendix).  The  only  difference  is  that  the  "0"  value  was 
\  '  dropped  because  a  rating  of  zero  proficiency  has  already  been  taken  into 

account  by  the  Coverage  Requirement  Analysis.  The  values  for  the  C^  variable 
^  range  from  "1"  to  "4". 

b 


Training  Difficulty  Analysis  (D) 

In  the  Training  Difficulty  Analysis  (D)  each  skill  receives  a  rating, 
from  "J_"  (minimal  or  none)  to  (substantial),  on  the  degree  of  difficulty 
to  learn  that  skill  to  required  proficiency  levels.  Aside  from  the  difficulty 
inherent  in  a  skill  itself,  a  rater  must  also  consider  the  proficiency  level 
of  the  trainees  before  training  and  that  required  after  training.  In  essence, 
this  analysis  greatly  simplifies  the  TV-A  procedures  for  deriving  the  weighted 
learning  deficit;  especially  the  rank  ordering  of  subtask  difficulty. 
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Physical  Characteristics  Analysis  (PC) 


In  considering  the  equipment  on  a  training  device,  (i.e.  the  device  char¬ 
acteristics),  the  TV-C  approach  is  quite  different  from  that  in  the  earlier 
versions  of  TRAINVICE  (i.e.,  physical  and  functional  similarity).  As  altern¬ 
atives  to  equipment  similarity  ratings,  the  TV-C  physical  and  functional 
characteristics  analyses  represent  attempts  to  have  a  training  analyst  assess 
more  directly  "how”  a  device  will  train  skills.  In  this  sense,  the  device 
characteristics  analyses  of  TV-C  resembles  the  training  techniques  analysis  in 
TV-A. 

The  Physical  Characteristics  Analysis  (PC)  addresses  the  appropriateness 
of  the  physical  equipment  supporting  the  training  of  each  skill.  Each  skill 
associated  with  the  controls  and  displays  is  translated  into  a  generic  harac- 
teristic  (e.g.,  Stimulus  Capabilities:  Visual  Form  -  Visual  Alphanumeric, 
etc.).  The  generic  characteristics  recommended  are  those  contained  in  the  ISD 
model  (Braby,  Henry,  Parrish,  and  Swope,  1  975).  Each  of  the  generic  charac¬ 
teristics  of  the  cue  or  response  related  to  a  display  or  control  is  rated  on 
how  well  it  follows  available  guidelines.  The  rating  scale  used  ranges  from 
"J)"  ("not  adequate")  to  "3,"  ("outstanding").  The  physical  characteristics 
score,  for  each  control  or  display,  is  the  sum  of  the  ratings  given  to  each 
relevant  generic  characteristic.  Similarly,  the  physical  characteristics 
score  for  each  skill  is  the  sum  of  the  scores  given  to  each  of  its  associated 
controls  and  displays. 

In  order  to  assist  in  making  the  physical  characteristics  ratings,  TV-C 
refers  a  user  to  a  series  of  learning  guidelines  (ISD).  To  use  these  guide¬ 
lines,  each  skill  must  first  be  classified  as  belonging  to  one  of  ten  behavi¬ 
oral  categories  (e.g.  identifying  symbols,  detections,  etc.).  For  each  of  the 
behavioral  categories  there  is  an  associated  set  of  learning  guidelines. 
Narva  (1  97  9b)  cautions  about  the  lack  of  specificity  of  the  ISD  guidelines. 
These  were  originally  intended  to  assist  in  the  selection  of  instructional 
media.  For  this  reason,  the  user  must  be  selective  in  the  application  of  the 
learning  guidelines.  Again,  it  must  be  emphasized  that  use  of  the  guidelines 
does  not  directly  generate  physical  characteristics  ratings,  it  merely  alerts 
the  user  to  some  of  the  general  behavioral  considerations  associated  with  each 
of  the  behavioral  categories  to  which  a  skill  might  belong. 

Functional  Characteristics  Analysis  (FC) 

The  Functional  Characteristics  Analysis  (FC)  attempts  to  assess  how  the 
physical  characteristics  of  a  training  device  are  used.  The  first  step  in 
this  analysis  is  to  place  each  skill  in  one  of  the  ten  behavioral  categories 
(as  in  the  PC  analysis).  A  user  then  refers  to  the  set  of  ISD  Learning 
Guidelines  associated  with  each  behavioral  category  and  selects  those  approp¬ 
riate  to  the  specific  skill  under  consideration.  Ratings  are  given  to  a  skill 
on  how  well  each  of  the  relevant  guidelines  are  implemented  or  used  in  a 
training  device  ("0^',  not  adequate;  to  "J",  outstanding).  The  FC  value  given 
to  each  skill  is  the  sum  of  the  ratings  made  on  each  of  the  associated 
guidelines. 
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Index  of  Predicted  Training  Effecti veness 

The  calculation  of  the  TV-C  index  of  effectiveness  was  designed  to 
represent  the  percentage  of  maximum  transfer.  the  procedures  to  combine  the 
components  consist  of  a  ratio  in  which  the  various  values  given  to  a  device 
are  combined  in  the  numerator.  The  denominator  is  a  combination  of  the 
maximum  possible  ratings  which  could  have  been  given.  The  TV-C  formula  is: 


(CR  x  C  x  Ct  x  D  x  (PC  +  FC))i 
(CR  x  c  x  Ci  x  D  x  (PCmax  «•  FC^)^ 


where: 

CR  Coverage  Requirements  Score 

C  Coverage  Score 

C^  Training  Criticality  Score 

D  Training  Difficulty  Score 

PC  Physical  Characteristics  Score 

FC  Functional  Characteristics  Score 

PCmax  Maximum  Possible  Physical  Characteristics  Score 

FC__V  Maximum  Possible  Functional  Characteristics  Score 

The  form  of  the  equation  given  in  the  first  TV-C  report  (Narva,  1979a)  shown 
above,  wa3  modified  slightly  in  a  second  report  (Narva,  1979b)  to  the 
following: 


(CR  x  C  x  C.  x  D  x  (PC  +  FC)), 

(CR  x  C  x  4  x  \  x  (PCmax  +  FCmajp)i 

The  value  of  "_4"  substituted  for  the  criticality  (C^)  and  difficulty  (D) 
variables  in  the  denominator  is  simply  the  greatest  value  either  of  these 
variables  could  have.  Both  of  the  above  equations  yield  indices  which  range 
between  "0"  and  "J_".  A  larger  index  value  (i.e.,  closer  to  1)  presumably 
indicates  a  greater  potential  for  transfer  to  the  operational  equipment. 

Summary  and  Critique 


E 


Like  the  other  versions  of  TRAINVICE,  TV-C  attempts  to  assess  the  training 
transfer  potential  of  a  training  device  by  assigning  values  to  a  variety  of 
judgments  about  a  device. 

Essentially,  there  are  three  major  subdivisions  within  TV-C,  an  input,  a 
training  analysis,  and  a  device  characteristics  analysis.  The  inputs  include 
the  operational  and  training  requirements  which  are  derived  from  a  task 
analysis  of  each  situation  (i.e.  operational  and  training).  The  training 
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analysis  is  an  estimation  of  the  required  level  of  proficiency  and  difficulty 
to  arrive  at  that  level  for  each  trainee.  Device  characteristics  analyses 
include  an  evaluation  of  the  physical  and  functional  aspects  of  components 
incorporated  into  a  device  as  these  adhere  to  accepted  instructional  or 
learning  priciples. 

In  two  papers,  ARI  RM  7  9-6,  7  9-7,  Narva  (1979a,  1  979b)  outlined  an 

extensive  modification  of  the  original  TRAINVICE  predictive  model.  The 
procedures  described  in  these  two  papers  are  identical;  only  the  calculation 
of  the  index  was  modified  in  the  second  paper  (i.e.,  Narva,  1  97 9b ) .  The  most 
striking  difference  between  TV-C  and  the  earlier  models  is  the  omission  of  an 
equipment  similiarity  (fidelity)  analysis.  The  training  techniques  analysis, 
which  had  been  dropped  in  TV-B  was  reintroduced  in  TV-C  in  the  form  of  two 
separate  analyses  (physical  and  functional  characteristics  analyses).  TV-C 
also  contains  a  coverage  requirement  (or  media  selection)  analysis,  not 
included  in  TV-A,  or  TV-B.  The  procedures  and  rating  scales  used  in  the 
various  preparatory  analyses  were  almost  completely  changed  in  TV-C.  Also, 
the  level  at  which  these  analyses  are  performed  is  at  the  individual  skill 
level,  not  sub task.  Considerable  changes  were  also  made  in  the  procedures 
used  to  calculate  an  overall  index  of  effectiveness. 

TV-C  included  a  Training  Criticality  and  Training  Difficulty  analysis  as 
a  weighting  factor  for  required  skills  and  knowledges.  A  skill  or  knowledge, 
therefore,  which  is  required  at  a  high  level  of  proficiency,  in  addition  to 
being  difficult  to  learn  is  assumed  to  have  more  significance  than  one  requir¬ 
ing  a  lower  proficiency  and  which  is  easier  to  learn.  Given  two  devices  under 
evaluation,  for  example,  one  covers  an  important  skill  while  the  other  does 
not.  The  evaluation  model  was  originally  intended  to  penalize  a  device  in 
such  a  situation.  TV-C  fails  to  accomplish  this.  A  "0"  C  value  for  a  skill 
causes  both  the  numerator  and  denominator  to  go  to  "0"  for  a  skill  not 
covered,  although  required.  The  result  is  as  if  that  skill  never  existed.  As 
will  be  discussed  shortly,  TV-D  corrected  this  situation. 

The  terminology  of  Training  Criticality  Analysis  is  somewhat  mislead¬ 
ing.  The  word  Criticality  seems  to  suggest  the  notion  of  importance,  either 
in  the  mission  or  training  setting.  As  presented  earlier,  the  C^  analysis 
addresses  the  required  level  of  proficiency  for  trainees,  and  has  nothing 
directly  to  do  with  criticality. 

The  criteria  for  a  user  to  make  judgments  about  each  of  the  analyses 
appears  to  be  too  vague.  The  scale  for  the  D  analysis,  for  example,  is: 

1  =  minimal  or  none 

2  -  some 

3  —  much 

4  =  substantial 

These  descriptions  of  the  rating  scale  may  reduce  the  reliability  of  the 
application.  That  is,  because  of  a  lack  of  specificity  of  definitions, 
judgments  by  different  users  may  vary  according  to  individual  interpre¬ 
tations.  This  possibility  exists  whenever  scales  of  this  type  are  used, 
however,  the  more  specific  the  criteria  for  assigning  values  the  less  likely 


differences  in  interpretation  can  occur.  This  leads  to  a  further  restatement 
of  the  need  to  validate  the  methodology  both  in  terms  of  construct  and  pre¬ 
dictive  validity.  This  issue  will  be  discussed  further  in  a  latter  section  of 
this  paper. 


TBAINVICE-D  (TV-D) 
Overview 


In  a  project  to  develop  a  user's  guidebook  for  TV-C,  further  revisions 
were  made  to  the  evaluation  model  (see  Swezey  and  Evans,  I  960;  Evans  and 
Swezey,  1981).  Despite  their  differences,  TV-C  and  TV-D  are  almost  identical 
in  the  variables  or  model  components  considered  and  in  the  procedures  used  to 
estimate  these  variables. 


Two  general  uses  of  this  model  have  been  identified  as  predictive  or 
prescriptive  applications.  A  predictive  application  is  used  when  existing 
training  devices  are  available  and  a  user  wants  to  evaluate  (or  predict)  their 
effectiveness.  In  its  prescriptive  mode,  the  model  is  purported  to  assist 
device  developers  in  making  design  decisions  in  the  early  concept  stages. 
Components  are  applied  either  separately  or  in  combination.  When  components 
are  combined,  an  overall  index  of  device  effectiveness  is  derived.  The 
overall  index  or  separate  components  analyses  are  only  of  value  when  two  or 
more  devices  are  under  evaluation.  In  one  sense  this  restriction  is  the 
result  of  the  overall  index  having  no  intrinsic  or  absolute  meaning.  In 
another,  each  of  the  components  can  be  used  as  a  comparative  assessment  to 
identify  deficiencies  in  a  device  under  evaluation. 

The  components  of  this  model  are: 

o  Coverage  (C) 

o  Training  Proficiency  (P) 

o  Learning  Difficulty  (D) 

o  Physical  Characteristics  (PC) 

o  Maximum  Possible  Physical  Characteristics  (PCmax) 
o  Functional  Characteristics  (FC) 

o  Maximum  Possible  Functional  Characteristics  (FCmax) 


\  /  \ 


The  formula  for  executing  TV-D  is 


N 

I 


PC  +  FC 


C  x  P  x  D 


A  device  evaluation  actually  begins  by  determining  whether  training  is 
required  on  the  skills  and  knowledges  needed  for  performance  on  the  oper¬ 
ational  equipment.  Once  the  training  skills  or  knowledges  have  been  iden¬ 
tified,  a  user  then  conducts  analyses  using  the  model  components.  These  are 
briefly  described  below. 


Procedures 


Because  TV-D  is  a  direct  derivative  of  TV-C,  it  addresses  the  same 
"What",  "Why",  "How"  questions  TV-C  does.  As  will  be  discussed  shortly,  some 
changes  have  been  made. 


Coverage  Requirements  Analysis  (CR) 

Although  not  formally  used  in  the  overall  index  formula,  the  Coverage 
Requirements  Analysis  (CR)  helps  determine  which  skills  or  knowledges  required 
in  the  operational  setting  should  be  represented  on  a  training  device. 
Working  from  a  consolidated  list  of  skills  and  knowledges,  a  user  decides 
whether  each  skill  or  knowledge  should  be  covered  by  a  training  device.  If 
the  decision  is  "yes",  then  a  value  of  "J_"  is  assigned  to  that  skill;  a  "no" 
decision  receives  a  "0". 


Coverage  Analysis  (C) 

If  a  skill  or  knowledge  is  required,  a  user  must  then  decide  if  that 
skill  or  knowledge  is  actually  represented.  A  Coverage  Analysis  (C)  value  of 
"J["  indicates  that  it  is,  while  "£"  indicates  it  is  not.  If  the  analysis  is 
conducted  early  in  a  device  development  phase  then  a  required  skill  (i.e.  CR  = 
"J_"),  which  was  not  originally  covered  in  a  device  design  (i.e.  C  =  "0"),  can 
be  included.  The  effect  of  failing  to  cover  a  required  skill  is  reflected  in 
a  lower  overall  index  for  a  particicular  device. 


Training  Proficiency  Analysis  (P) 

This  component  assigns  a  value  corresponding  to  the  degree  of  proficiency 
which  a  trainee  must  attain  for  each  skill  or  knowledge  subsequent  to  training 
on  a  device.  The  Training  Proficiency  Analysis  (P)  is  conducted  on  each  skill 
or  knowledge  which  received  a  CR  value  of  "J_",  even  if  one  device  in  a  compar¬ 
ison  failed  to  cover  (i.e.,  C  =  "£")  a  particular  skill  or  knowledge. 

A  four  point  (i.e.,  "J_"  to  "J»")  rating  scale  is  used  to  assign  a  P  value, 
where  "J_"  corresponds  to  a  level  requiring  limited  knowledge.  When  expert 
levels  of  knowledge  are  required,  a  P  value  of  "JP  is  assigned.  The  P  values 
are  then  summed  across  all  skills  and  knowledges. 


Learning  Difficulty  Analysis  (D) 


The  Learning  Difficulty  Analysis  (D)  specifies  the  degree  of  learning 
difficulty  associated  with  attaining  a  required  skill  or  knowledge.  Several 


factors  have  been  identified  which  enter  into  a  user's  decision  in  assigning  a 
D  value.  These  are  the: 


o  level  of  skill/knowledge  proficiency  to  be 
attained  by  a  trainee 

o  entry-level  capabilities  of  a  trainee 
(i.e.  pre- training  on  the  skills  or 
knowledges. 

o  level  of  learning  difficulty  typically  inherent 
in  a  skill  or  knowledge 

In  making  D  judgments,  a  user  assigns  a  value  ranging  from  a  low  of  "1_"  to  a 
high  of  "V.  The  higher  a  D  value  the  more  difficult  a  skill  or  knowledge  is 
to  learn.  Like  the  P  analysis,  D  values  are  assigned  only  to  skills  or  know¬ 
ledges  which  have  been  determined  to  be  required  (i.e.  CR=1)  and  then  summed. 


Physical  Characteristics  Analysis  (PC) 

This  is  the  first  of  two  analyses  which  are  referred  to  as  Device 
Characteristics  Analyses.  In  other  words,  attention  is  now  focused  on  analy¬ 
zing  displays  and  controls  on  a  device.  The  Physical  Characteristics  Analysis 
(PC)  assesses  how  well  the  physical  characteristics  of  a  device  support 
guidelines  or  principles  of  good  instruction.  A  separate  PC  analysis  is 
conducted  for  each  device  under  consideration. 

In  conducting  a  PC  analysis,  a  user  must  first  determine  the  type  of 
behavior  that  is  required  to  accomplish  a  particular  skill  or  knowledge.  Each 
skill  or  knowledge  is  assigned  to  a  behavioral  category  which  coresponds  to 
the  type  of  performance  required  by  a  trainee.  These  behavioral  descriptions 
were  adapted  from  the  U.S.  Army  Interservice  Procedures  for  Instructional 
Systems  Development  (TRADOC  Pam.  350-30,  1975).  Next  a  user  decides  which 
instructional  practices  are  applicable  for  developing  the  type  of  behavior 
associated  with  a  skill  or  knowledge,  which  are  listed  under  each  behavioral 
category.  These  instructional  practices  or  guidelines  represent  a  standard 
against  which  each  device  will  be  evaluated.  Because  these  guidelines  corres¬ 
pond  to  skills  or  knowledges,  they  remain  the  same  for  each  device  under 
evaluation. 

A  user  then  identifies  the  Generic  Stimulus  and  Response  Characteristics 
for  each  display  and  control  which  correspond  to  particular  skills  and  know¬ 
ledges.  That  is,  a  user  must  identify  the  stimulus  characteristics  of  dis¬ 
plays  and  learner  response  modes.  The  list  of  possible  stimulus  characteris¬ 
tics  (i.e.  capabilities)  and  response  modes  are  those  presented  by  Braby, 
Henry,  Parrish  and  Swope  (1975).  The  PC  analysis  concludes  by  assigning  a 
value  or  rating  on  how  well  each  generic  characteristic  of  a  display  or  con¬ 
trol  supports  the  good  instructional  practices  identified  earlier.  Values  of 
the  PC  analysis  range  from  "j)" ,  extremely  deficient  in  implementing  the 
guidelines,  to  ”2?,  implementation  is  highly  proficient,  for  each  skill  or 
knowledge.  The  total  PC  score  then  becomes  the  sum  of  the  values  assigned  to 
each  skill  or  knowledge. 


Maximum  Possible  Physical  Characteristics  (PCmax) 

The  maximum  possible  physical  characteristics  (Pcmax)  value  for  each 
skill  or  knowledge  is  simply  three  times  the  number  of  applicable  generic 
stimulus  and  response  characteristics. 


Functional  Characteristics  Analysis  (FC) 

The  second  device  characteristic  analysis  is  the  Functional  Characteris¬ 
tics  Analysis  (FC).  The  FC  analysis  is  similar  to  the  PC  Analysis  in  that  it 
assesses  how  well  the  functional  elements  of  a  training  device  follow  guide¬ 
lines  for  good  instructional  practice.  Skills  and  knowledges  are,  again, 
compared  to  the  behavioral  categories,  and  good  instructional  practices  under 
each  category.  These  instructional  guidelines  are  now  identified  solely  for 
functional  and  not  physical  characteristics.  Again,  these  form  a  standard  to 
which  the  functional  worth  of  displays  and  controls  are  compared.  In 
completing  the  FC  Analysis,  a  user  rates  how  well  each  display  and  control, 
corresponding  to  a  skill  or  knowledge,  implements  the  functional  guidelines 
for  good  instructional  practice.  The  scale  used  ranges  from  "0"  extremely 
deficient  implementation  to  '^3"  highly  proficient  implementation  of  the 
guidelines  for  each  skill  or  knowledge.  These  values  are  summed  for  all 
skills  and  knowledges  under  consideration  (i.e.  CR  =  "1"). 


Maximum  Possible  Functional  Characteristics  (FCmax) 

Like  the  PCmax,  the  maximum  possible  functional  characteristics  score  is 
three  times  the  total  number  of  applicable  functional  guidelines. 


Index  Calculation 

The  calcualtion  of  the  final  index  is  completed  by  simply  substituting 
the  values  of  each  analysis  discussed  above  and  carrying  out  the  operations  in 
the  formula: 

N  /  PC  +  FC _ 

i°l  \  ^^-Tnax  F^max 

t  h  x  D) 

1=1  V  ' 


)  (_c  *  p  *  D) 


The  resulting  index  will  be  a  number  between  and  nV*,  As  this  value 
approaches  "J_",  the  better  training  transfer  capability  of  a  device.  The 
overall  index,  however,  only  has  value  when  comparing  two  or  more  existing 
devices  or  device  concepts. 


Summary 

While  quite  similar  to  previous  approaches,  some  changes  have  been  ins¬ 
tituted.  Training  Proficiency  Analysis  (P)  was  formerly  called  Training 
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Criticiality  (C^)  in  TV-C.  The  term  "Criticality"  was  considered  misleading, 
perhaps  suggesting  importance  of  a  skill,  which  was  not  the  case. 

The  principle  difference  between  the  TV-D  formula  and  that  of  TV-C  is  the 
removal  of  the  coverage  variable  from  the  denominator.  In  this  way  the  credit 
or  penalty  supposedly  given  to  a  device  for  covering  or  failing  to  cover  a 
particular  skill  is  weighted  by  the  Training  Proficiency  and  the  Learning 
Difficulty  scores.  That  is,  an  overall  index  of  effectiveness  would  be 
enhanced  more  for  covering  skills  that  require  a  high  degree  of  proficiency 
and  are  difficult  to  learn,  than  for  covering  relatively  trivial  skills. 
Similarly,  when  a  skill  is  not  covered,  the  degree  to  which  an  overall  effec- 
tivness  index  is  decreased  is  weighted  by  the  proficiency  and  difficulty 
scores  for  a  skill.  In  the  TV-C  formula,  the  credit  given  for  coverage  of  a 
skill  was  weighted  by  criticality  and  difficulty,  however,  lack  of  coverage 
was  not  penalized  at  all.  The  presence  of  the  coverage  variable  in  both  the 
numerator  and  denominator  would  cause  both  to  equal  zero  for  an  uncovered 
skill.  Thus,  in  TV-C,  each  skill  not  covered  by  a  training  device  neither 
contributes  to  nor  takes  away  from  an  overall  effectiveness  index. 

Four  of  the  rating  scales  used  in  the  preparatory  analyses  for  TV-C  were 
modified  in  TV-D.  These  are  the  scales  used  in  the  Training  Difficulty 
Analysis,  Training  Proficiency  Analysis  ("Criticality"  in  TV-C),  and  Physical 
and  Functional  Characteristics  Analyses.  In  all  cases,  wording  changes  were 
made  in  the  attempt  to  provide  more  guidance  to  the  user  than  had  been 
available  in  TV-C  (see  Appendix).  There  were  no  changes,  however,  in  the 
numerical  properties  of  the  scales. 

The  Physical  and  Functional  Characteristics  Analyses  contain  additional 
changes  in  guidance  given  to  users.  The  ten  behavioral  categories  (from  ISD) 
used  in  TV-C  were  given  expanded  definitions  accompanied  by  examples.  TV-D 
incorporated  new  learning  guidelines,  associated  with  each  behavioral  cate¬ 
gory,  which  were  modifications  of  those  already  in  the  ISD.  Moreover,  each 
learning  guidline  was  identified  with  a  "P",  "F",  or  "P/F";  to  indicate 
whether  a  particular  guideline  was  relevant  to  analyzing  the  physical  char¬ 
acteristics,  the  functional  characteristics,  or  both. 

Several  issues  of  concern  have  evolved  regarding  the  application  of  TV- 
D.  Actually  these  issues  appear  equally  valid  for  the  earlier  models  as 
well.  The  first  is  the  manner  in  which  values  for  the  various  components  are 
aggregated  into  a  single  index.  The  components  of  TV-D  appear  to  form  a 
series  of  fractions,  all  based  on  separate  criteria.  These  then  become  accum¬ 
ulated  or  summed  in  violation  of  basic  rules  for  such  addition.  In  other 
words,  there  is  no  attempt  to  find  a  common  denominator. 

A  second  concern  is  that  different  guidelines  on  "good  instructional 
practices"  are  used  for  the  PC  and  FC  analyses.  Further,  the  procedure  for 
designating  the  PC  and  FC  values  is  cumbersome,  both  of  these  issues  seem  to 
increase  the  possibility  of  poor  reliability  in  assigning  values. 

By  necessity,  it  seems  that  a  long  list  of  skills  and  knowledges  are 
required  to  apply  TV-D.  Once  these  are  identified,  a  series  of  additions  and 
multiplications  is  required.  Again,  reliability  seems  to  be  vulnerable,  if 
for  no  other  reason  then  arithmetic  errors.  In  addition,  a  user  must  begin  a 
TV-D  analysis  with  a  consolidated  list  of  skills  and  knowledges  derived  from 
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the  list  of  all  skills  and  knowledges  required  in  the  operational  setting. 
Construction  of  the  consolidated  list  requires  a  user  to  eliminate  from 
consideration  those  skills  and  knowledges  that  are  repeated  on  more  than  one 
task  or  subtask.  A  TV-D  index,  therefore,  is  desired  on  only  a  selected 
number  of  skills  and  knowledges,  with  no  implication  for  a  particular  skill 
being  repeated.  Perhaps  a  logical  argument  can  be  made  that  if  a  skill  or 
knowledge  appears  in  more  than  one  task,  then  that  repetition  should  indicate 
some  degree  of  importance.  Yet,  in  executing  TV-D,  all  skills  begin  as  equal 
with  only  proficiency  and  difficulty  as  primary  considerations  or  weighting 
f ac  tors . 

Another  issue  of  concern  is  the  reliance  on  TRADOC  Pam.  350-30  as 
providing  "good  instructional  guidelines."  These  guidelines  were  developed 
for  training  programs  in  general  and  not  for  training  devices.  This 
application  in  a  device  effectiveness  method  is  suspect. 


iii 

Summary  of  the  Models 


Input 

All  four  TRAINVICE  models  require  task  analytic  and  equipment  information 
as  input.  The  models  vary  somewhat  in  the  detail  of  the  task  information 
required  for  input,  as  well  as  in  the  task  taxonomic  level  at  which  variable 
values  are  estimated  (e.g.,  task-by-task)  (see  Table  1).  There  are  two  types 
of  equipment  information  required:  physical  (i.e.  size,  location,  etc.) 

characteristics  of  the  controls  and  displays  and  functional  chracteristics 
(operation  and  use  of  the  equipment).  The  four  models  are  comparable  in  the 
amount  of  detail  required  in  the  physical  information.  There  are,  however, 
differences  among  the  models  in  the  level  of  resolution  required  in  the 
functional  information.  The  two  models  which  involve  equipment  similarity 
analyses  (TV-A  and  TV-B)  require  specification  of  the  amount  of  information 
(in  bits)  transmitted  between  the  human  operator  and  the  controls  and  dis¬ 
plays.  The  two  models  without  similarity  analyses  (TV-C  and  TV-D)  may  need 
more  general  accounts  of  the  stimuli  (or  cues)  supporting  the  behavior  and  the 
types  of  responses  required. 

TABLE  1 


Models 

Input  Resolution 

Level  of  Analysis 

TV-A 

Sub- task 

Sub- task 

TV-B 

Task  element;  Skill 

Task 

TV-C 

Skill 

Skill 

TV-D 

Skill 

Skill 

Preparatory  Analyses  &  Model  Variables 

The  four  TRAINVICE  models  involve  several  general  types  of  preparatory 
analyses.  Table  2  shows  the  commonalities  among  the  models  in  terms  of  these 
analyses.  Each  kind  of  analysis  produces  an  estimate  of  a  value  for  a  partic¬ 
ular  variable. 

In  the  coverage  and  communality  analyses,  a  ”1"  or  ”0"  is  used  primarily 
to  penalize  for  non-coverage  of  skills.  Penalization  issues  are  most  relevant 
to  each  model's  equations,  and  will  be  discussed  below. 

The  class  of  variables  in  Table  2  called  "Learning"  variables  concern: 

1)  the  amount  of  Increase  required  in  the  proficiency  levels  of  trainees,  and 

2)  the  amount  of  difficulty  inherent  in  training  each  task.  In  TV-A  both  of 
these  are  combined  into  a  Weighted  Learning  Deficit  score.  In  other  words,  an 
estimate  of  incoming  trainee  skill  level  is  subtracted  from  a  criterion  pro¬ 
ficiency  level.  This  difference  is  then  weighted  by  the  ranked  difficulty  of 
training  that  particular  skill.  In  TV-B,  the  difference  in  proficiency  levels 
(Skills  and  Knowledge  Requirements)  is  estimated  in  a  similar  way  to  TV-A. 
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Training  Difficulty  is  estimated  more  simply  by  a  rating  scale,  instead  of  a 
ranking  procedure.  The  two  values  (proficiency  requirements  and  training 
difficulty)  are  averaged  in  the  TV-B  final  equation.  In  both  TV-C  and  TV-D, 
the  values  of  the  proficiency  and  difficulty  variables  are  estimated  with 
rating  scales  and  are  kept  separate  throughout  subsequent  calculations. 

The  equipment  similarity  and  training  techniques  variables  are  the  only 
model  components  which  are  concerned  with  the  features  of  a  training  device 
and  how  well  they  will  support  training.  As  can  be  seen  in  Table  2,  the 
models  vary  widely  in  their  emphasis,  or  lack  of  emphasis,  on  each  of  these 
variables. 

The  one  model  which  addresses  both  variables  is  TV-A.  Here,  equipment 
similarity  has  two  components:  physical  and  functional  similarity.  Values 
are  assigned  to  each  and  are  averaged  for  an  overall  Similarity  score.  In 
order  to  derive  the  training  techniques  score,  a  user  first  categorizes  each 
subtask  according  to  Braby's,  et  al.  (1975)  task  taxonomy.  The  task  category 
then  refers  the  user  to  a  special  set  of  learning  principles  for  that  category 
(after  Willis  and  Peterson,  1961;  and  Micheli,  7  97 2).  The  principles  concern 
stimulus,  response,  and  feedback  aspects  of  equipment.  A  conservative  esti¬ 
mate  is  made  regarding  the  implementation  of  these  principles  by  a  device, 
which  then  generates  a  value  for  the  Training  Techniques  variable. 

In  TV-B,  training  techniques  are  ignored,  with  an  average  of  physical  and 
functional  similarity  scores  as  the  only  predictor  variable.  The  analysis 
used  to  generate  the  Similarlity  score  is  almost  identical  to  that  in  TV-A. 

TV-C  and  TV-D  abandon  equipment  similiarity  as  separate  analyses.  It  is 
hard  to  disagree  with  this  because  there  is  little  literature  supporting  the 
assumption  of  a  general,  monotonic  relationship  between  equipment  fidelity  and 
training  effectiveness  (a  minimal  criterion  for  the  selection  of  any  predictor 
variable. )  The  traditional  assumption  of  such  a  relationship  has  undoubtedly 
been  based  on  approaches  to  transfer  of  training  such  as  Osgood's  (1949).  The 
problem  with  such  an  assumption,  in  the  context  of  training  devices,  is  that 
it  must  lead  to  the  conclusion  that  the  best  device  for  training  is  the 
operational  equipment  itself.  Put  differently,  this  approach  assumes  that  the 
cues  necessary  to  maintain  skilled  performance,  on  the  operational  equipment, 
are  sufficient  and  in  fact  optimal  to  support  learning. 

The  level  of  stimulation  present  on  a  training  device  however,  may  have 
different  effects  on  various  kinds  of  learners.  Skilled  performers,  for 
example,  have  already  learned  to  use  to  their  advantage  all  the  relevant  cues 
available  in  the  operational  environment.  To  a  novice,  however,  the  stimula¬ 
tion  presented  by  the  operational  environment  may  be,  in  large  part,  noise; 
(i.e.,  a  source  of  distraction),  therefore  a  hinderance  to  learning.  Some¬ 
times,  it  may  be  desireable,  therefore,  to  reduce  the  number  of  cues  available 
(i.e.,  lower  fidelity)  during  initial  training.  While  in  other  situations,  it 
may  be  desireable  to  increase  the  amount  of  information  presented  in  the 
training  environment  in  order  to  augment  feedback  and  knowledge  of  results. 
In  yet  other  simulations,  compressing  the  time  frame  of  a  task  series  may 
enhance  training. 

While  presently  there  may  be  insufficient  knowledge  regarding  relation¬ 
ships  between  fidelity  and  training  effectiveness  to  warrant  its  use  as  a 
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predictor  variable  (without  qualifications) ,  equipment  similarity  cannot  be 
ignored.  In  the  solution  adopted  by  TV-C  and  TV-D,  fidelity  is  considered 
only  in  the  context  of  fairly  specific  task  domains,  not  as  an  end  in 
itself.  TV-C  and  TV-D  adopted  what  is  in  essence  an  amplification  of  the 
Training  Techniques  Analysis  in  TV-A.  The  modified  analysis  (in  TV-C  and  TV- 
D)  directs  a  user  to  different  sets  of  learning  principles  for  different 
skills.  Using  these  principles,  a  user  assesses  how  well  the  physical  and 
functional  characteristics  of  a  training  device  support  training.  The  learn¬ 
ing  principles  vary  for  each  skill  category.  For  some  skills,  the  relevant 
principles  include  guidelines  concerned  with  some  aspect  of  equipment  similar¬ 
ity.  For  other  kinds  of  skills,  fidelity  is  de-emphasized.  Realistic  and 
continuous  feedback  is  recommended  for  tracking  tasks,  for  example.  Whereas 
"equipment  realism  can  be  at  a  minimum"  for  procedural  tasks. 


Model  Output 


The  values  determined  for  the  preparatory  analyses  are  combined  in  a 
specific  computational  formula  for  each  model  (in  Table  3)-  Each  formula  is 
used  to  generate  an  overall  index  of  training  effectiveness  which  ranges 
between  0  and  1;  the  higher  the  index,  the  more  effective  a  training  device. 
All  of  the  equations  used  by  the  models  have  been  designed  to  predict  training 
effectiveness,  with  overlap  in  the  variables  considered.  The  only  mathemati¬ 
cal  property  common  to  all  of  the  formulae  is  the  use  of  linear  combina¬ 
tions.  That  is,  the  variables  are  combined  in  a  simple  multiplicative 
fashion. 


TV-A  is  the  only  model  whose  formula  was  based  on  the  Gagne,  et  al. 
(1948)  savings  measure  of  transfer  of  training.  The  index  of  effectiveness 
for  a  device  is  determined  by  the  equipment  similarity  and  training  techniques 
scores,  weighted  by  the  learning  deficit  score.  The  weighting  strategy  em¬ 
ployed  was  the  "weighted  mean".  The  general  form  taken  by  a  weighted  mean 
is:  If  each  value  x.  is  associated  with  a  weighting  factor  Wj_,  where  w^  0, 

then  §  wi  is  the  total  weight,  and: 

i=l  n 


Note  that  the  weights  can  not  have  negative  values. 

The  equation  used  to  generate  the  index  in  TV-B  is  not  clearly  related  to 
any  particular  transfer  of  training  measure.  The  TV-B  index  is  determined  by 
equipment  similarity  scores  weighted  by  the  required  skills  and  knowledges  and 
task  training  difficulty  scores.  Together,  these  two  variables  cover  informa¬ 
tion  similar  to  that  in  the  learning  deficit  score  of  TV-A.  The  manner  in 
which  the  weighting  is  accomplished  in  TV-B  can  only  be  considered  a  weighted 
mean  when  there  are  no  "unique  tasks"  trained  by  a  device. 

The  index  of  TV-C  was  developed  to  reflect  the  percentage  of  maximum 
transfer  possible.  The  equation  used  to  compute  the  index,  therefore,  is  a 


Table  3 

Model  Kquations  for  the  Calculation  of  Overall  Effectiveness  Indices 
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N.B.  See  Table  2  for  variable  names. 


ratio  of  the  variable  values  estimated  for  a  particular  training  device, 
divided  by  the  maximum  values  which  could  be  assigned  to  those  variables  (with 
the  exception  of  the  coverage  variables). 

The  only  part  of  the  TV-D  formula  which  retains  the  above  percentage  is 
the  ratio  of  physical  and  functional  characteristics  scores,  to  their  maximum 
values.  The  rest  of  the  equation  has  been  revised,  primarily  for  reasons 
related  to  penalization  of  a  device  for  non-coverage  of  particular  skills. 

In  computing  the  overall  figure  of  merit  for  a  training  device,  a  cover¬ 
age  penalty  has  been  included  in  various  ways  in  the  four  models.  TV-A  penal¬ 
izes  a  device  for  not  covering  subtasks  which  require  training.  The  penaliza¬ 
tion  strategy  used  in  TV-B  lowers  the  index  both  for:  1)  not  covering  tasks 
requiring  training,  and  2)  covering  tasks  which  do  not  require  training  (i.e., 
unique  tasks).  The  implementation  of  this  penalty  in  TV-B  is  present  in 
almost  all  of  this  model's  preparatory  analyses,  as  well  as  being  part  of  the 
final  equation.  The  reason  given  for  the  penalization  of  unique  tasks  was 
that  it  would  allow  the  TV-B  index  to  reflect  an  unnecessary  increase  in  cost, 
while  lowering  training  effectiveness.  The  major  problem  with  this  rationale 
is  the  underlying  assumption  that  all  "extra"  training  features  cost  the  same 
amount  and  generally  lower  effectiveness;  i.e.,  the  penalty  is  equal  for  all 
unique  tasks.  TV-B  is  the  only  model  to  use  thJ  enalization  strategy. 

In  TV-C,  there  is  no  penalization  for  non-coverage.  If  a  skill  is  not 
covered  by  a  training  device,  zeroes  are  entered  into  the  summations  in  both 
the  numerator  and  denominator  of  the  final  equation.  That  is,  nothing  is 
contributed  or  taken  away  for  skills  not  covered.  The  TV-D  formula  reintro¬ 
duced  the  penalty  for  non-coverage.  Moreover,  the  penalty  for  not  covering  a 
particular  skill  is  proportional  to  the  "importance"  of  that  skill  (i.e., 
adjusted  by  the  proficiency  and  difficulty  variables).  In  other  words,  the 
credit  for  coverage  and  the  penalty  for  non-coverge  are  both  weighted  by  the 
same  variables. 


Prescriptive  Mode 


In  addition  to  its  use  in  evaluating  alternative  training  devices,  an 
analytic  model  (such  as  TRAINVICE)  is  also  needed  to  provide  guidance  in  the 
specification  of  training  device  characteristics.  That  is,  what  is  required 
is  a  prescriptive  model  as  well  as  a  predictive  one.  Whether  or  not  both  of 
these  functions  can  be  performed  by  one  of  the  TRAINVICE  models  (or  any  other 
single  model)  remains  to  be  seen.  In  all  of  the  TRAINVICE  publications,  there 
is  only  one  strategy  recommended  for  the  use  of  a  predictive  model  in  the 
prescriptive  mode.  This  strategy  is  simply  to  perform  the  predictive  proce¬ 
dures  (ratings,  etc.)  with  a  device's  design  specifications  as  input.  An 
index  of  the  device's  potential  training  effectiveness  (if  built)  is  then 
generated.  If  a  prediction  of  poor  transfer  of  training  results,  the  device's 
design  can  then  be  changed  in  an  attempt  to  improve  its  effectiveness.  The 
new  design  can  then  be  evaluated  by  generating  a  new  prediction;  and  so  on. 
In  other  words,  the  model  does  not  directly  specify  the  most  desirable  train¬ 
ing  device  characteristics.  Rather,  the  model  is  used  to  give  feedback  on  the 
effectiveness  of  a  proposed  device;  thus,  providing  indirect  guidance  in  the 
design  process. 


Wheaton,  Fingermnn,  Rose,  and  Leonard  (1976)  caution  us  about  such  an 
early  (in  the  life  cycle  of  a  device)  application  of  a  predictive  model. 
They  state  that  an  early  application  would  rely  almost  exclusively  on  the 
Training  Device  Requirements  document  (TDR),  and  that  the  information  in  the 
TDR  would  be  of  insufficient  breadth  and  quality  to  allow  performance  of  the 
model's  preparatory  analyses.  The  only  solution  offered  by  Wheaton,  Finger- 
man,  Rose,  and  Leonard  (1976)  is  the  reformatting  of  the  TDR. 

In  the  absence  of  a  major  change  in  the  TDR's  scope  and  level  of  detail, 
the  question  will  remain:  Can  an  analytic  model  demonstrate  an  acceptable 
amount  of  predictive  power  when  relying  on  rather  unspecific  task  and  equip¬ 
ment  information?  That  is,  can  a  predictive  model  work  with  low  resolution 
imput?  If  the  answer  to  this  question  is  negative,  then  the  other  question 
which  remains  is:  Can  a  truly  (i.e.,  directly)  prescriptive  model  be 
developed? 


Separate  Indices 


The  overall  index  of  effectiveness,  generated  by  each  of  the  models, 
would  clearly  be  of  use  when  a  choice  must  be  made  between  two  competing 
training  devices.  The  single  figure  of  merit  for  each  device  provides  the 
decision  maker  with  rather  straightforward  guidance;  i.e.,  a  "bottom  line". 
The  utility  of  an  overall  index  would,  however,  be  minimal  when  decisions 
must  be  made  concerning:  l)  training  device  design  specifications  and  modi¬ 
fications  (prescription);  and  2)  development  of  a  program  of  instruction 
which  will  complement  the  strengths  and  compensate  for  the  weaknesses  of  a 
training  device  (implementation).  Either  situation  demands  guidance  which 
is  task,  or  perhaps  skill,  specific.  In  other  words,  what  is  needed  is  a 
separate  index  of  training  effectiveness  for  each  task  (or  skill).  Whatever 
the  form  that  a  separate  index  eventually  takes,  its  development  will  con¬ 
tribute  not  only  to  the  task  specific  questions  of  design  and  implementa¬ 
tion,  but  also  to  the  construction  of  a  valid  overall  index. 


IV 


Conclusions 


This  report  reviewed  the  TRAINVICE  models  for  predicting  training  device 
effectiveness.  The  models  were  presented  as  they  were  reported  in  the  origin¬ 
al  documentation.  It  is  hoped  that  we  have  remained  faithful  to  the  original 
authors'  intents. 

TRAINVICE  appears  to  be  a  promising  method  for  analytically  assessing 
training  device  effectiveness  during  various  stages  of  development.  But 
progress  in  developing  and  refining  the  methodology  has  been  slow.  Army 
decision-makers  need  and  can  use  a  TRAINVICE  approach  now.  Unfortunately,  the 
research  community  is  not  ready  to  field  this  methodology. 

To  meet  this  demand,  ARI  is  conducting  programmatic  research  to  validate 
and  refine  TRAINVICE  methodology.  As  part  of  this  research,  a  priori  investi¬ 
gations  of  the  mathematical  sensitivity  and  distributional  properties  of  the 
models  are  planned.  The  core  of  these  sensitivity/distribution  tests  will  be 
computer  programs  based  on  each  of  the  TRAINVICE  equations.  The  general 
procedure  to  be  followed  will  be  the  generation  of  index  values,  given  system¬ 
atic  variation  of  component  variable  values. 

Validation  efforts  will  consist  of  comparisons  of  model  predictions  and 
empirically  obtained  transfer  of  training  data.  Efforts  are  being  made  to 
identify  a  variety  of  training  devices  and  simulators  which  have  recently  been 
(or  will  soon  be)  empirically  evaluated.  For  each  device,  judgmental  data 
will  be  collected  on  the  variables  considered  by  each  of  the  analytic 
models.  In  this  manner,  an  index  of  effectiveness  can  be  generated  using  each 
model,  and  all  indices  can  be  compared  to  the  same  set  of  empirical  data. 

In  addition  to  actual  field  studies,  laboratory  research  will  be  conduc¬ 
ted  to  test  the  predictive  power  of  model  variables  more  systematically.  The 
experimental  manipulation  of  these  variables  will  consist  of  locating  or 
constructing  devices  which  will  conform  to  extreme  as  well  as  moderate  vari¬ 
able  values.  It  is  hoped  that  the  examination  of  devices  which  are  markedly 
different  from  each  other,  will  permit  the  emergence  of  reliable  effects.  A 
major  problem  which  plagued  prior  validation  efforts  was  that  the  devices 
being  compared  were  not  significantly  different  in  the  amount  of  transfer 
predicted  or  obtained.  The  planned  approach  will  help  to  avoid  merely 
confirming  a  prediction  of  the  null  hypothesis. 

An  initial  milestone  is  to  develop  a  useable,  although  r  '•im,  version 
of  a  model  that  may  be  routinely  applied  to  training  devices  as  they  progress 
through  various  stages  in  the  acquisition  cycle.  While  not  expected  to  be 
perfect,  an  evaluation  approach  which  systematically  assesses  a  device, 
backed-up  by  guidance  on  its  interpretation  seems  a  possible  reality  in  the 
foreseeable  future. 

Since  the  application  of  any  of  the  models  reviewed  here  is  a  fairly 
burdensome  process,  an  associated  milestone  will  be  an  automated  (i.e., 
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computer-based)  implementation.  The  form  the  implementation  is  expected  to 
take  is  an  interactive  program  which  will:  l)  lead  the  user  through  the 
model  procedures  and  guidance  relevant  to  each  judgment;  2)  maintain  records 
of  all  tasks,  equipment,  and  judgmental  rating  information;  and  3)  perform 
all  calculations  and  generate  hard  copy  of  predictive  indices.  This  strat¬ 
egy  should  permit  the  user  to  focus  almost  all  of  his  or  her  time  and  energy 
on  making  the  judgments,  which  promise  to  be  challenging  in  any  analytic 
model. 

The  results  of  training  device  evaluations,  both  analytic  and  empirical, 
will  ultimately  be  incorporated  into  a  computer-based  management  information 
system.  As  the  data  base  contained  in  such  a  system  grows,  it  will  permit 
training  developers  and  researchers  to  track  the  history  of  individual  train¬ 
ing  devices  throughout  their  life  cycles,  from  initial  design  to  field  utili¬ 
zation.  Longitudinal  training  device  data  will,  eventually,  support  the 
continuous  validation  and  refinement  of  both  predictive  and  prescriptive 
methods. 

An  investigation  of  current  Army  procedures  followed  in  the  writing  of  a 
Training  Device  Requirements  document  (TDR)  will  also  be  performed  to  support 
the  development  of  prescriptive  methods.  As  mentioned  earlier,  Wheaton, 
Fingerman,  Rose,  and  Leonard  (1976)  identified  the  shortcomings  of  the  TDR  as 
the  major  limitation  on  an  early,  prescriptive  application  of  analytic  evalu¬ 
ation  methods.  Since  the  TDR  investigation  will  address  the  ways  in  which 
information  is  generated  and  used  during  the  acquisition  of  a  training  de¬ 
vice,  this  effort  is  expected  to  enhance  Army  utilization  of  device  evalua¬ 
tion  data,  and  to  improve  the  overall  quality  of  these  data. 

In  reviewing  the  TRAINVICE  models,  it  became  apparent  that  there  is  also 
a  need  for  a  thoroughgoing  re-examination  of  the  models’  underlying  assump¬ 
tions  about  which  characteristics  of  a  training  device  will  foster  effective 
training.  In  particular,  this  investigation  must  concern  the  applicability 
of  the  various  sets  of  Learning  Guidelines  to  specific  questions  of  device 
evaluation.  The  Learning  Guidelines  used  in  the  TRAINVICE  models  were  origi¬ 
nally  intended  to  aid  in  media  selection  decisions.  It  is  still  unknown, 
however,  whether  or  not  the  same  guidelines  are  of  sufficient  detail,  or 
validity,  to  be  of  use  in  the  evaluation  of  the  transfer  potential  of  a  par¬ 
ticular  training  device.  A  second  problem  which  needs  to  be  addressed  is  the 
assumption  that  each  of  the  guidelines  will  promote  transfer  of  training.  In 
some  cases  guidance  appears  to  be  directed  primarily  toward  enhancing  the 
rate  at  which  learning  takes  place,  and  in  others,  toward  increasing  skill 
retention.  Although  rate  of  learning,  retention,  and  transfer  are  all  con¬ 
sidered  measures  of  "good  training,"  they  are  not  always  similarly  affected 
by  the  same  variables.  For  example,  a  variable  which  increases  rate  of 
learning  may  have  no  effect  on  retention  (Underwood,  1964). 

Adequate  definitions  of  each  of  the  Learning  Guidelines  are  needed.  Such 
a  definition  would  consist  minimally  of  an  identification  of  the  manipulable 
parameters  (i.e.,  independent  variables)  implied  by  each  guideline,  and  the 
specific  effects  of  those  parameters  on  rate  of  learning,  retention,  and 
transfer  of  training.  It  is  certain  that  the  prediction  of  device  effective¬ 
ness  and  the  prescription  of  effective  devices,  will  be  greatly  buttressed  by 
the  guidance  which  results  from  this  effort.  First,  an  extensive  review  of 
the  research  literature,  both  basic  and  applied,  will  be  required  to  find 


sources  supporting  each  guideline  and  to  identify  areas  in  which  new  empirical 
research  is  needed.  Once  data  have  been  collected  the  task  of  generating  new 
guidelines,  and  of  incorporating  them  into  device  evaluation  procedures  will 
remain.  Clearly,  the  refinement  of  the  Learning  Guidelines  must  be  considered 
a  long-term  goal. 

To  recapitulate  briefly,  our  review  of  the  TRAINVICE  models  has  led  us  to 
the  following  general  conclusions.  Despite  their  various  limitations, the 
TRAINVICE  models  are  ambitious  and  promising  methods  for  the  analytic  evalu¬ 
ation  of  training  device  effectiveness.  The  evident  merits  of  these  models 
warrant  a  programmatic  series  of  validation  and,  eventually,  implementation 
efforts.  Any  significant  improvement  in  predictive  or  prescriptive  methods 
will  require  a  long-term  re-examination  of  the  principles  underlying  training 
device  effectiveness. 

The  scope  and  amount  of  work  outlined  above  is,  admittedly,  great. 
However,  the  potential  utility  of  analytic  evaluation  methods  and  the  persis¬ 
tent  need  for  them  are  at  least  as  great. 
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Has  received  a  complete  briefing  on  the  subject  or  skill. 
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step  of  the  operation.  Requires  much  more  training  and  ex 
perience.  Has  received  "familiarization"  training  only. 
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